Sciweavers

42 search results - page 2 / 9
» A DOM Tree Alignment Model for Mining Parallel Data from the...
Sort
View
COLING
2010
12 years 12 months ago
An Empirical Study on Web Mining of Parallel Data
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
WWW
2011
ACM
12 years 11 months ago
Growing parallel paths for entity-page discovery
In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty ...
Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick...
DMKD
2003
ACM
114views Data Mining» more  DMKD 2003»
13 years 10 months ago
Deriving link-context from HTML tag tree
HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks...
Gautam Pant
ACL
2008
13 years 6 months ago
Mining Parenthetical Translations from the Web by Word Alignment
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...
WWW
2005
ACM
14 years 5 months ago
Using visual cues for extraction of tabular data from arbitrary HTML documents
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree, we also exploit visual cues in the rendered version of the document to extrac...
Bernhard Krüpl, Marcus Herzog, Wolfgang Gatte...