Sciweavers

2190 search results - page 387 / 438
» Unweaving a web of documents
Sort
View
118
Voted
ICAIL
2007
ACM
15 years 4 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
DASFAA
2004
IEEE
135views Database» more  DASFAA 2004»
15 years 4 months ago
Semi-supervised Text Classification Using Partitioned EM
Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling traini...
Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu
WWW
2006
ACM
16 years 1 months ago
FeedEx: collaborative exchange of news feeds
As most blogs and traditional media support RSS or Atom feeds, the news feed technology becomes increasingly prevalent. Taking advantage of ubiquitous news feeds, we design FeedEx...
Seung Jun, Mustaque Ahamad
98
Voted
WWW
2008
ACM
16 years 1 months ago
Performance of compressed inverted list caching in search engines
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of ...
Jiangong Zhang, Xiaohui Long, Torsten Suel
104
Voted
WWW
2005
ACM
16 years 1 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo