Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible...
Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong...
Two-dimensional (2-D) plots in digital documents contain important information. Often, the results of scientific experiments and performance of businesses are summarized using pl...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
A considerable amount of clean semistructured data is internally available to companies in the form of business reports. However, business reports are untapped for data mining, da...
Stephen W. Liddle, Douglas M. Campbell, Chad Crawf...
Background: Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where import...
Lawrence Hunter, Zhiyong Lu, James Firby, William ...