Sciweavers

62 search results - page 4 / 13
» Creating Permanent Test Collections of Web Pages for Informa...
Sort
View
SIGIR
2008
ACM
13 years 5 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
WWW
2009
ACM
14 years 6 months ago
Incorporating site-level knowledge to extract structured data from web forums
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
ICWSM
2009
13 years 3 months ago
MakeMyPage: Social Media Meets Automatic Content Generation
Finding out about a topic online can be time consuming. It involves visiting multiple news sites, encyclopedia entries, video repositories and other resources while discarding irr...
Francisco Iacobelli, Kristian J. Hammond, Larry Bi...
JCDL
2004
ACM
128views Education» more  JCDL 2004»
13 years 10 months ago
Panorama: extending digital libraries with topical crawlers
A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of inform...
Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson...
CHI
1996
ACM
13 years 9 months ago
Silk from a Sow's Ear: Extracting Usable Structures from the Web
In its current implementation, the World-Wide Web lacks much of the explicit structure and strong typing found in many closed hypertext systems. While this property has directly f...
Peter Pirolli, James E. Pitkow, Ramana Rao