Search Sciweavers | Sciweavers

8479 search results - page 152 / 1696

» Data Extraction from Web Data Sources

131

click to vote

EMNLP
2008

139views Natural Language Processing» more EMNLP 2008»

Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model

15 years 4 months ago

Download www.aclweb.org

Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....

Lei Shi, Ming Zhou

claim paper

Read More »

111

click to vote

ICML
2004
IEEE

120views Machine Learning» more ICML 2004»

Improving SVM accuracy by training on auxiliary data sources

16 years 3 months ago

Download web.engr.oregonstate.edu

The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second...

Pengcheng Wu, Thomas G. Dietterich

claim paper

Read More »

click to vote

COLING
2010

89views Computational Linguistics» more COLING 2010»

Instance Sense Induction from Attribute Sets

14 years 10 months ago

Download www.aclweb.org

This paper investigates the new problem of automatic sense induction for instance names using automatically extracted attribute sets. Several clustering strategies and data source...

Ricardo Martin-Brualla, Enrique Alfonseca, Marius ...

claim paper

Read More »

122

click to vote

WWW
2008
ACM

130views Internet Technology» more WWW 2008»

Yes, there is a correlation: - from social networks to personal behavior on the web

16 years 3 months ago

Download www2008.org

Characterizing the relationship that exists between a person's social group and his/her personal behavior has been a long standing goal of social network analysts. In this pa...

Parag Singla, Matthew Richardson

claim paper

Read More »

134

click to vote

WWW
2010
ACM

257views Internet Technology» more WWW 2010»

CETR: content extraction via tag ratios

15 years 9 months ago

Download www.cs.illinois.edu

We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...

Tim Weninger, William H. Hsu, Jiawei Han

claim paper

Read More »

« Prev « First page 152 / 1696 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers