Sciweavers

115 search results - page 12 / 23
» Training Data Cleaning for Text Classification
Sort
View
IPM
2002
106views more  IPM 2002»
14 years 11 months ago
A feature mining based approach for the classification of text documents into disjoint classes
This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressio...
Salvador Nieto Sánchez, Evangelos Triantaph...
WWW
2008
ACM
16 years 11 days ago
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from larges...
Xuan Hieu Phan, Minh Le Nguyen, Susumu Horiguchi
CLEF
2010
Springer
15 years 24 days ago
ZOT! to Wikipedia Vandalism - Lab Report for PAN at CLEF 2010
Abstract This vandalism detector uses features primarily derived from a wordpreserving differencing of the text for each Wikipedia article from before and after the edit, along wit...
James White, Rebecca Maessen
LREC
2008
141views Education» more  LREC 2008»
15 years 1 months ago
New Resources for Document Classification, Analysis and Translation Technologies
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
Stephanie Strassel, Lauren Friedman, Safa Ismael, ...
COMPSAC
2004
IEEE
15 years 3 months ago
N-Gram-Based Detection of New Malicious Code
The current commercial anti-virus software detects a virus only after the virus has appeared and caused damage. Motivated by the standard signature-based technique for detecting v...
Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, Ray...