: information contained in companies’ financial statements is valuable for decision making at various levels. Much of the relevant information in such documents is contained in t...
XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applica...
Eugene Agichtein, C. T. Howard Ho, Vanja Josifovsk...
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadat...
Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zh...
We present a new approach for building reconstruction from a single Digital Elevation Model (DEM). It treats buildings as an assemblage of simple urban structures extracted from a...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...