This paper proposes a novel application of a statistical language model to opinionated document retrieval targeting weblogs (blogs). In particular, we explore the use of the trigg...
We propose a semantic tagger that provides high level concept information for phrases based on several kinds of low level information about words in clinical narrative texts. The ...
Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper w...
In this paper we describe our experiments in finding answers from documents based on statistical and linguistic knowledge. We collected the candidate answers from sources available...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...