As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
The problem of low-quality information on the Web is nowhere more important than in the domain of health, where unsound information and misleading advice can have serious consequen...
Thanh Tin Tang, David Hawking, Ramesh S. Sankarana...
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data acc...
We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compi...