Intrinsic plagiarism detection is a sub-task of authorship identification in which outlier chunks must be detected solely on the basis of stylistic differences from the main body o...
Abstract. Stochastic finite automata are useful for identifying substrings (chunks) within larger units of text. Relevant applications include tokenization, base-NP chunking, name...
We present sppc, a high-performance system for intelligent text extraction and navigation from German free text documents. sppc consists of a set of domainindependent shallow core...
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In thi...