Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation

13 years 6 months ago

Download www.itee.uq.edu.au

We present an investigation of recently proposed character and word sequence kernels for the task of authorship attribution based on relatively short texts. Performance is compared with two corresponding probabilistic approaches based on Markov chains. Several configurations of the sequence kernels are studied on a relatively large dataset (50 authors), where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, the amount of training material has more influence on discrimination performance than the amount of test material. Moreover, we show that the recently proposed author unmasking approach is less useful when dealing with short texts.

Conrad Sanderson, Simon Günter

Real-time Traffic

EMNLP 2006 | EMNLP 2007 | Probabilistic Approaches | Sequence Kernels | Word Sequence Kernels |

claim paper

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	EMNLP
Authors	Conrad Sanderson, Simon Günter

Sciweavers

Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation

EMNLP 2006 | EMNLP 2007 | Probabilistic Approaches | Sequence Kernels | Word Sequence Kernels |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers