Sciweavers

241 search results - page 28 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
AIRWEB
2006
Springer
15 years 1 months ago
Tracking Web Spam with Hidden Style Similarity
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
Tanguy Urvoy, Thomas Lavergne, Pascal Filoche
ICDAR
2003
IEEE
15 years 3 months ago
Identifying Story and Preview Images in News Web Pages
The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...
Jianying Hu, Amit Bagga
120
Voted
CIKM
2008
Springer
14 years 11 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
81
Voted
WWW
2007
ACM
15 years 10 months ago
Query-driven indexing for peer-to-peer text retrieval
We describe a query-driven indexing framework for scalable text retrieval over structured P2P networks. To cope with the bandwidth consumption problem that has been identified as ...
Gleb Skobeltsyn, Toan Luu, Karl Aberer, Martin Raj...
93
Voted
SIGIR
2005
ACM
15 years 3 months ago
Indexing emails and email threads for retrieval
Electronic mail poses a number of unusual challenges for the design of information retrieval systems and test collections, including informal expression, conversational structure,...
Yejun Wu, Douglas W. Oard