A major source of information (often the most crucial and informative part) in scholarly articles from scientific journals, proceedings and books are the figures that directly pro...
Amr Ahmed, Eric P. Xing, William W. Cohen, Robert ...
Discriminative sequential learning models like Conditional Random Fields (CRFs) have achieved significant success in several areas such as natural language processing, information...
Xuan Hieu Phan, Minh Le Nguyen, Tu Bao Ho, Susumu ...
The two most important tasks in information extraction from the Web are webpage structure understanding and natural language sentences processing. However, little work has been don...
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, Ji-R...
A system, called NewsStand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news article and applies an online clustering algori...
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the em...