Search Sciweavers | Sciweavers

42 search results - page 1 / 9

» A DOM Tree Alignment Model for Mining Parallel Data from the...

146

click to vote

ACL
2006

141views Computational Linguistics» more ACL 2006»

A DOM Tree Alignment Model for Mining Parallel Data from the Web

15 years 6 months ago

Download research.microsoft.com

This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...

Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao

claim paper

Read More »

170

click to vote

EMNLP
2008

139views Natural Language Processing» more EMNLP 2008»

Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model

15 years 6 months ago

Download www.aclweb.org

Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....

Lei Shi, Ming Zhou

claim paper

Read More »

192

click to vote

ACL
2009

167views Computational Linguistics» more ACL 2009»

Mining Bilingual Data from the Web with Adaptively Learnt Patterns

15 years 3 months ago

Download www.aclweb.org

Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...

Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...

claim paper

Read More »

174

click to vote

ICDM
2006
IEEE

164views Data Mining» more ICDM 2006»

Unsupervised Learning of Tree Alignment Models for Information Extraction

15 years 11 months ago

Download users.soe.ucsc.edu

We propose an algorithm for extracting ﬁelds from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...

Philip Zigoris, Damian Eads, Yi Zhang

claim paper

Read More »

197

click to vote

ICDM
2007
IEEE

476views Data Mining» more ICDM 2007»

FiVaTech: Page-Level Web Data Extraction from Template Pages

15 years 11 months ago

Download www.csie.ncu.edu.tw

In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...

Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...

claim paper

Read More »

« Prev « First page 1 / 9 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers