Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and impro...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
We present a novel system and methodology for browsing and exploring topics and concepts within a document collection. The process begins with the generation of multiple taxonomie...
W. Scott Spangler, Jeffrey T. Kreulen, Justin Less...
tion Abstract ChengXiang Zhai (Advisor: John Lafferty) Language Technologies Institute School of Computer Science Carnegie Mellon University With the dramatic increase in online in...
The vast majority of the published skew estimation methods for scanned document images are for textual documents. These methods are based on the principle that the skew angles can...