A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product in...
The Web consists of a large amount of unstructured information that hardly can be elaborated by automatic agents. In recent years, a considerable number of techniques for informat...
Leonardo Rigutini, Ernesto Di Iorio, Marco Ernande...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
In order to navigate huge document collections efficiently, tagged hierarchical structures can be used. For users, it is important to correctly interpret tag combinations. In this ...
Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dim...