Although Web search engines have become information gateways to the Internet, for queries containing technical terms, search results often contain pages that are difficult to be ...
An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two ta...
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying ...
Hai He, Weiyi Meng, Yiyao Lu, Clement T. Yu, Zongh...
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in th...
Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo...
Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consumi...