Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its b...
A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tab...
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high...
Syllabi are important documents created by instructors for students. Students use syllabi to find information and to prepare for class. Instructors often need to find similar syl...
Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel A. P...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...