Sciweavers

399 search results - page 34 / 80
» Filtering Documents with Subspaces
Sort
View
ICDAR
2005
IEEE
15 years 9 months ago
Image Analysis for Efficient Categorization of Image-based Spam E-mail
To circumvent prevalent text-based anti-spam filters, spammers have begun embedding the advertisement text in images. Analogously, proprietary information (such as source code) ma...
Hrishikesh Aradhye, Gregory K. Myers, James A. Her...
CEAS
2005
Springer
15 years 9 months ago
Automatic Discovery of Personal Topics to Organize Email
We present in this paper a procedure to automatically discover a user s personal topics by clustering their emails. Unlike previous work, we automatically label topics using appro...
Arun C. Surendran, John C. Platt, Erin Renshaw
IRFC
2010
Springer
15 years 1 months ago
An Information Retrieval Model Based on Discrete Fourier Transform
Abstract. Information Retrieval (IR) systems combine a variety of techniques stemming from logical, vector-space and probabilistic models. This variety of combinations has produced...
Alberto Costa, Massimo Melucci
EMNLP
2009
15 years 1 months ago
Chinese Novelty Mining
Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open challenge in text mining. In this paper, we describe the preprocess...
Yi Zhang, Flora S. Tsai
122
Voted
NSDI
2010
15 years 4 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...