Automatic Document Metadata Extraction Using Support Vector Machines

16 years 28 days ago

Download clgiles.ist.psu.edu

Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classiﬁcation-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method ﬁrst classiﬁes each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classiﬁcation by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classiﬁcation performance. Our metadata extrac...

Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zh

Real-time Traffic

Automatic Metadata | Automatic Metadata Extraction | JCDL 2003 | Metadata Extraction |

claim paper

» Extracting Important Sentences with Support Vector Machines

» Automatic Extraction of Textual Elements from News Web Pages

» Automatic mining of cognitive metadata using fuzzy inference

» Relation Extraction Using Support Vector Machine

» Automatic Sleep Staging using Support Vector Machines with Posterior Probability Estimates

» Using Automatic Metadata Extraction to Build a Structured Syllabus Repository

» Hierarchical document categorization with support vector machines

» Classification of Cardiac Arrhythmias using Biorthogonal Wavelets and Support Vector Machi...

Post Info
More Details (n/a)

Added	05 Jul 2010
Updated	05 Jul 2010
Type	Conference
Year	2003
Where	JCDL
Authors	Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, Edward A. Fox

Comments (0)

Sciweavers

Automatic Document Metadata Extraction Using Support Vector Machines

Automatic Metadata | Automatic Metadata Extraction | JCDL 2003 | Metadata Extraction |

Explore & Download

Productivity Tools

Sciweavers