Determining an author's native language by mining a text for errors

16 years 4 months ago

Download eprints.pascal-network.org

In this paper, we show that stylistic text features can be exploited to determine an anonymous author's native language with high accuracy. Specifically, we first use automatic tools to ascertain frequencies of various stylistic idiosyncrasies in a text. These frequencies then serve as features for support vector machines that learn to classify texts according to author native language. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning ? Analogies, Concept learning, Connectionism and neural nets, Induction, Knowledge acquisition, Language acquisition, Parameter learning General Terms Algorithms, Measurement, Experimentation Keywords Text mining, author profiling

Moshe Koppel, Jonathan Schler, Kfir Zigdon

Real-time Traffic

Author Native Language | Data Mining | KDD 2005 | Stylistic Text Features | Various Stylistic Idiosyncrasies |

claim paper

» Mining FeatureOpinion in Online Customer Reviews for Opinion Summarization

» On Generation of Firewall Log Status Reporter SRr Using Perl

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2005
Where	KDD
Authors	Moshe Koppel, Jonathan Schler, Kfir Zigdon

Comments (0)

Sciweavers

Determining an author's native language by mining a text for errors

Author Native Language | Data Mining | KDD 2005 | Stylistic Text Features | Various Stylistic Idiosyncrasies |

Explore & Download

Productivity Tools

Sciweavers