Supporting Web-based Address Extraction with Unsupervised Tagging

13 years 10 months ago
Supporting Web-based Address Extraction with Unsupervised Tagging
Abstract. The manual acquisition and modeling of tourist information as e.g. addresses of points of interest is time and, therefore, cost intensive. Furthermore, the encoded information is static and has to be refined for newly emerging sight seeing objects, restaurants or hotels. Automatic acquisition can support and enhance the manual acquisition and can be implemented as a run-time approach to obtain information not encoded in the data or knowledge base of a tourist information system. In our work we apply unsupervised learning to the challenge of web-based address extraction from plain text data extracted from web pages dealing with locations and containing the addresses of those. The data is processed by an unsupervised partof-speech tagger (Biemann, 2006a), which constructs domain-specific categories via distributional similarity of stop word contexts and neighboring content words. In the address domain, separate tags for street names, locations and other address parts can be o...
Berenike Loos, Chris Biemann
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where GFKL
Authors Berenike Loos, Chris Biemann
Comments (0)