Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures ...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
We propose new features and algorithms for automating Web-page classification tasks such as content recommendation and ad blocking. We show that the automated classification of We...
We present a new model for detection of noun phrases in unrestricted text, whose most outstanding feature is its flexibility: the system is able to recognize noun phrases similar ...
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...