This paper presents a framework for user-oriented text mining. It is then illustrated with an example of discovering knowledge from competitors’ websites. The knowledge to be di...
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. Thi...
Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, loc...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...