Search Sciweavers | Sciweavers

42 search results - page 2 / 9

» A DOM Tree Alignment Model for Mining Parallel Data from the...

click to vote

COLING
2010

137views Computational Linguistics» more COLING 2010»

An Empirical Study on Web Mining of Parallel Data

12 years 12 months ago

Download www.aclweb.org

This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...

Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim

claim paper

Read More »

click to vote

WWW
2011
ACM

184views Internet Technology» more WWW 2011»

Growing parallel paths for entity-page discovery

12 years 11 months ago

Download www.cs.illinois.edu

In this paper, we use the structural and relational information on the Web to ﬁnd entity-pages. Speciﬁcally, given a Web site and an entity-page (e.g., department and faculty ...

Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick...

claim paper

Read More »

click to vote

DMKD
2003
ACM

114views Data Mining» more DMKD 2003»

Deriving link-context from HTML tag tree

13 years 10 months ago

Download dollar.biz.uiowa.edu

HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks...

Gautam Pant

claim paper

Read More »

click to vote

ACL
2008

160views Computational Linguistics» more ACL 2008»

Mining Parenthetical Translations from the Web by Word Alignment

13 years 6 months ago

Download www.aclweb.org

Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...

Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...

claim paper

Read More »

click to vote

WWW
2005
ACM

108views Internet Technology» more WWW 2005»

Using visual cues for extraction of tabular data from arbitrary HTML documents

14 years 5 months ago

Download www.dbai.tuwien.ac.at

We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree, we also exploit visual cues in the rendered version of the document to extrac...

Bernhard Krüpl, Marcus Herzog, Wolfgang Gatte...

claim paper

Read More »

« Prev « First page 2 / 9 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers