In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedu...
Identifying the occurrences of proper names in text and the entities they refer to can be a difficult task because of the manyto-many mapping between names and their referents. We...
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
This paper uses the URL word breaking task as an example to elaborate what we identify as crucialin designingstatistical natural language processing (NLP) algorithmsfor Web scale ...
Kuansan Wang, Christopher Thrasher, Bo-June Paul H...