In recent years, efforts have begun to put math contents on the Web. As for other types of Web information, search capabilities should be provided to enable users to find what the...
Today, large-scale web services run on complex systems, spanning multiple data centers and content distribution networks, with performance depending on diverse factors in end syst...
Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, A...
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
An appreciation of the roles of genre and task is important in understanding how people browse the Web. Genre is characterized by content and form and is intimately linked to the ...
Carolyn R. Watters, Michael A. Shepherd, Forbes J....
An approach to postal address detection from webpages is proposed. The webpages are first segmented into text blocks based on their visual similarity. The text content in each bl...