Abstract Identifier attributes--very high-dimensional categorical attributes such as particular product ids or people's names--rarely are incorporated in statistical modeling....
A number of content management tasks, including term categorization, term clustering, and automated thesaurus generation, view natural language terms (e.g. words, noun phrases) as...
Alberto Lavelli, Fabrizio Sebastiani, Roberto Zano...
In this paper, we evaluate the performance on Arabic handwriting of the text-independent writer identification methods that we developed and tested on Western script in recent yea...
The human ability to learn difficult object categories from just a few views is often explained by an extensive use of knowledge from related classes. In this work we study the use...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...