Sciweavers

42 search results - page 1 / 9
» A DOM Tree Alignment Model for Mining Parallel Data from the...
Sort
View
ACL
2006
13 years 6 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
EMNLP
2008
13 years 6 months ago
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Lei Shi, Ming Zhou
ACL
2009
13 years 2 months ago
Mining Bilingual Data from the Web with Adaptively Learnt Patterns
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
13 years 10 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
ICDM
2007
IEEE
476views Data Mining» more  ICDM 2007»
13 years 11 months ago
FiVaTech: Page-Level Web Data Extraction from Template Pages
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...