Sciweavers

41 search results - page 6 / 9
» Text Genre Detection Using Common Word Frequencies
Sort
View
TOIS
2002
97views more  TOIS 2002»
14 years 9 months ago
Burst tries: a fast, efficient data structure for string keys
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each d...
Steffen Heinz, Justin Zobel, Hugh E. Williams
AIRWEB
2008
Springer
14 years 11 months ago
Cleaning search results using term distance features
The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their resul...
Josh Attenberg, Torsten Suel
CIKM
2011
Springer
13 years 9 months ago
Partial duplicate detection for large book collections
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Ismet Zeki Yalniz, Ethem F. Can, R. Manmatha
AMTA
2004
Springer
15 years 2 months ago
A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation
Abstract. Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequen...
Debbie Elliott, Anthony Hartley, Eric Atwell
EMNLP
2008
14 years 11 months ago
Relative Rank Statistics for Dialog Analysis
We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a...
Juan Huerta