Sciweavers

281 search results - page 2 / 57
» Introducing the Enron Corpus
Sort
View
CEAS
2006
Springer
13 years 9 months ago
Online Discriminative Spam Filter Training
We describe a very simple technique for discriminatively training a spam filter. Our results on the TREC Enron spam corpus would have been the best for the Ham at .1% measure, and...
Joshua Goodman, Wen-tau Yih
ADMA
2010
Springer
271views Data Mining» more  ADMA 2010»
13 years 23 days ago
Exploiting Concept Clumping for Efficient Incremental E-Mail Categorization
We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting "clumps" of messages that are classified similarly. Clumping reflec...
Alfred Krzywicki, Wayne Wobcke
LREC
2008
220views Education» more  LREC 2008»
13 years 7 months ago
Introducing DRS (The Digital Replay System): a Tool for the Future of Corpus Linguistic Research and Analysis
This paper outlines the new resource technologies, products and applications that have been constructed during the development of a multi-modal (MM hereafter) corpus tool on the D...
Dawn Knight, Paul Tennent
CEAS
2006
Springer
13 years 9 months ago
Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically
Just as email spam has negatively impacted the user messaging experience, the rise of Web spam is threatening to severely degrade the quality of information on the World Wide Web....
Steve Webb, James Caverlee, Calton Pu
ESORICS
2009
Springer
14 years 6 months ago
Authentic Time-Stamps for Archival Storage
Abstract. We study the problem of authenticating the content and creation time of documents generated by an organization and retained in archival storage. Recent regulations (e.g.,...
Alina Oprea, Kevin D. Bowers