Web spam is a widely-recognized threat to the quality and security of the Web. Web spam pages pollute search engine indexes, burden Web crawlers and Web mining services, and expos...
Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defi...
The results presented in this paper come from an exploratory study of 19,379 mind maps created by 11,179 users from the mind mapping applications ‘Docear’ and ‘MindMeister...
Government agencies must often quickly organize and analyze large amounts of textual information, for example comments received as part of notice and comment rulemaking. Hierarchi...
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...