Increasingly user-generated content is being utilised as a source of information, however each individual piece of content tends to contain low levels of information. In addition, ...
This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic ret...
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
Twarql is an infrastructure translating microblog posts from Twitter as Linked Open Data in real-time. The approach employed in Twarql can be summarized as follows: (1) extract co...
Pablo N. Mendes, Alexandre Passant, Pavan Kapanipa...