Sciweavers

23 search results - page 3 / 5
» Focused web crawling in the acquisition of comparable corpor...
Sort
View
WWW
2006
ACM
14 years 5 months ago
Status of the African Web
As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
ECIR
2009
Springer
14 years 2 months ago
Quality-Oriented Search for Depression Portals
The problem of low-quality information on the Web is nowhere more important than in the domain of health, where unsound information and misleading advice can have serious consequen...
Thanh Tin Tang, David Hawking, Ramesh S. Sankarana...
WWW
2007
ACM
14 years 5 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
SIGMOD
2010
ACM
232views Database» more  SIGMOD 2010»
13 years 5 months ago
Optimizing content freshness of relations extracted from the web using keyword search
An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data acc...
Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang
EACL
2006
ACL Anthology
13 years 6 months ago
Compiling French-Japanese Terminologies from the Web
We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compi...
Xavier Robitaille, Yasuhiro Sasaki, Masatsugu Tono...