Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality cont...
Automatically generating location overviews in the form of both visual and textual descriptions is highly desired for online services such as travel planning, to provide attractiv...
We have developed a web-repository crawler that is used for reconstructing websites when backups are unavailable. Our crawler retrieves web resources from the Internet Archive, Go...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
—Web 2.0 applications, including blogs, wikis and social networking sites, pose challenging privacy issues. Many users are unaware that search engines index personal information ...
Michael Hart, Claude Castille, Rob Johnson, Amanda...