Sciweavers

KDD
2009
ACM

Address standardization with latent semantic association

14 years 4 months ago
Address standardization with latent semantic association
Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates, millions of free-text addresses need to be converted to a standard format for data integration, de-duplication and householding. Existing commercial tools usually employ lots of hand-craft, domain-specific rules and reference data dictionary of cities, states etc. These rules work better for the region they are designed. However, rule-based methods usually require more human efforts to rewrite these rules for each new domain since address data are very irregular and varied with countries and regions. Supervised learning methods usually are more adaptable than rule-based approaches. However, supervised methods need large-scale labeled training data. It is a labor-intensive and time-consuming task to build a large-scale annotated corpus for each target domain. For minimizing human efforts and the size of labe...
Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang,
Added 25 Nov 2009
Updated 25 Nov 2009
Type Conference
Year 2009
Where KDD
Authors Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su
Comments (0)