While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new techni...
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data ...
Segmentation through seeded region growing is widely used because it is fast, robust and free of tuning parameters. However, the seeded region growing algorithm requires an automat...
Inexpensive data collection and storage technologies and a global thirst for information have led to data repositories so large that users may become disoriented and unable to loc...
The automatic annotation of images presents a particularly complex problem for machine learning researchers. In this work we experiment with semantic models and multi-class learnin...