Sciweavers

154 search results - page 16 / 31
» Using Wikipedia and Wiktionary in Domain-Specific Informatio...
Sort
View
SIGIR
2010
ACM
15 years 1 months ago
Crowdsourcing a wikipedia vandalism corpus
We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, a...
Martin Potthast
SIGIR
2011
ACM
14 years 13 days ago
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this pro...
Ferhan Ture, Tamer Elsayed, Jimmy J. Lin
CORR
2010
Springer
128views Education» more  CORR 2010»
14 years 9 months ago
TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities)
We designed and implemented Tagme, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of Tagme w...
Paolo Ferragina, Ugo Scaiella
ICTIR
2009
Springer
15 years 4 months ago
What's in a Link? From Document Importance to Topical Relevance
Web information retrieval is best known for its use of the Web’s link structure as a source of evidence. Global link evidence is by nature query-independent, and is therefore no ...
Marijn Koolen, Jaap Kamps
APCCM
2009
14 years 10 months ago
Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Shahrul Azman Noah, Lailatulqadri Zakaria, Arifah ...