We address the problem of academic conference homepage understanding for the Semantic Web. This problem consists of three labeling tasks - labeling conference function pages, func...
Abstract-Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, nonuniform schema that require costly data integra...
Bryan Chan, Leslie Wu, Justin Talbot, Mike Cammara...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Systems based on statistical and machine learning methods have been shown to be extremely effective and scalable for the analysis of large amount of textual data. However, in the r...
Semantic web researchers tend to assume that XML Schema and OWL-S are the correct means for representing the types, structure, and semantics of XML data used for documents and int...
Andruid Kerne, Zachary O. Toups, Blake Dworaczyk, ...