As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
Graphical components information extraction is a crucial step in the chart recognition and understanding process. However, existing methods of information extraction from chart im...
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
We present two algorithms for supporting semi-automatic ontology building, integrated in WPro, a new architecture for ontology learning from Web documents. The first algorithm auto...
Daniele Bagni, Marco Cappella, Maria Teresa Pazien...
Abstract-Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, nonuniform schema that require costly data integra...
Bryan Chan, Leslie Wu, Justin Talbot, Mike Cammara...