Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the conte...
Panagiotis G. Ipeirotis, Luis Gravano, Mehran Saha...
Rapid increase in the number of pages on web sites, and widespread use of search engine optimization techniques, lead to web sites becoming difficult to navigate. Traditional site ...
Current web search engines return result pages containing mostly text summary even though the matched web pages may contain informative pictures. A text excerpt (i.e. snippet) is ...
There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collecti...
Web search engines work well for finding crawlable pages, but not for finding datasets hidden behind Web search forms. We describe a novel technique for detecting search forms, ...