We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product d...
Katharina Probst, Rayid Ghani, Marko Krema, Andrew...
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the e...