Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on e...
Cultural heritage documentation induces the use of computerized techniques to manage and preserve the information produced. Geographical information systems have proved their pote...
Anne Durand, Pierre Drap, Elise Meyer, Pierre Grus...
Motivated by current efforts to construct more realistic spam filtering experimental corpora, we present a newly assembled, publicly available corpus of genuine and unsolicited (s...
When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressio...