In this paper we will briefly describe the approaches taken by the Cheshire (Berkeley) Group for the CLEF Adhoc-TEL 2009 tasks (Mono and Bilingual retrieval). Recognizing that man...
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
People often use powerful tools to manage the documents they encounter, but very rarely to store the mental knowledge they glean from those documents. Popcorn is a personal knowle...
Stephen Davies, Scotty Allen, Jon Raphaelson, Emil...
Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity – resulting from summariza...
Donald Metzler, Yaniv Bernstein, W. Bruce Croft, A...
Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applie...