Automatic link detection: a sequence labeling approach

12 years 1 months ago
Automatic link detection: a sequence labeling approach
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems – link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language Processing – text analysis.; I.3.1...
James J. Gardner, Li Xiong
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors James J. Gardner, Li Xiong
Comments (0)