Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain ...
Dmitri V. Kalashnikov, Rabia Nuray-Turan, Sharad M...
: Most of the Web traffic today uses the HyperText Transfer Protocol (HTTP), with the Transmission Control Protocol (TCP) as the underlying transport protocol. TCP provides several...
Israel Cidon, Raphael Rom, Amit Gupta, Christoph L...
The PC Desktop is a very rich repository of personal information, efficiently capturing user's interests. In this paper we propose a new approach towards an automatic persona...
Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang...