Sciweavers

502 search results - page 11 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
111
Voted
WWW
2009
ACM
16 years 1 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
DOCENG
2009
ACM
15 years 7 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu
112
Voted
EACL
2006
ACL Anthology
15 years 2 months ago
Multilingual Term Extraction from Domain-specific Corpora Using Morphological Structure
Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identificatio...
Delphine Bernhard
122
Voted
WEBDB
2009
Springer
149views Database» more  WEBDB 2009»
15 years 7 months ago
Extracting Route Directions from Web Pages
Linguists and geographers are more and more interested in route direction documents because they contain interesting motion descriptions and language patterns. A large number of s...
Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaisw...
114
Voted
ICDAR
2003
IEEE
15 years 6 months ago
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
Yasuto Ishitani