We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
We describe a new multi-phase, color-based image retrieval system, FOCUS Fast Object Color-based qUery System, with an online user interface which is capable of identifying mult...
Madirakshi Das, Edward M. Riseman, Bruce A. Draper
Distributed heterogeneous search systems are an emerging phenomenon in Web search, in which independent topic-specific search engines provide search services, and metasearchers d...
Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
The objective of this work is to derive quantitative statements about what fraction of web search queries issued to the state-of-the-art commercial search engines lead to excellen...
Hugo Zaragoza, Berkant Barla Cambazoglu, Ricardo A...