Sciweavers

8479 search results - page 152 / 1696
» Data Extraction from Web Data Sources
Sort
View
115
Voted
EMNLP
2008
15 years 2 months ago
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Lei Shi, Ming Zhou
96
Voted
ICML
2004
IEEE
16 years 1 months ago
Improving SVM accuracy by training on auxiliary data sources
The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second...
Pengcheng Wu, Thomas G. Dietterich
71
Voted
COLING
2010
14 years 8 months ago
Instance Sense Induction from Attribute Sets
This paper investigates the new problem of automatic sense induction for instance names using automatically extracted attribute sets. Several clustering strategies and data source...
Ricardo Martin-Brualla, Enrique Alfonseca, Marius ...
106
Voted
WWW
2008
ACM
16 years 1 months ago
Yes, there is a correlation: - from social networks to personal behavior on the web
Characterizing the relationship that exists between a person's social group and his/her personal behavior has been a long standing goal of social network analysts. In this pa...
Parag Singla, Matthew Richardson
118
Voted
WWW
2010
ACM
15 years 8 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han