Sciweavers

252 search results - page 20 / 51
» Mining a Web Citation Database for Document Clustering
Sort
View
197
Voted
ICDE
2004
IEEE
117views Database» more  ICDE 2004»
16 years 3 months ago
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...
James Caverlee, Ling Liu, David Buttler
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
15 years 8 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
87
Voted
KDD
1998
ACM
80views Data Mining» more  KDD 1998»
15 years 6 months ago
Human Performance on Clustering Web Pages: A Preliminary Study
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search ...
Sofus A. Macskassy, Arunava Banerjee, Brian D. Dav...
123
Voted
CIKM
2006
Springer
15 years 5 months ago
Multi-evidence, multi-criteria, lazy associative document classification
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...
KDD
2002
ACM
138views Data Mining» more  KDD 2002»
16 years 2 months ago
Learning to match and cluster large high-dimensional data sets for data integration
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...
William W. Cohen, Jacob Richman