Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web p...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
Abstract. The paper presents a description of the FMD BioPortal biosurveillance system (http://fmd.ucdavis.edu/bioportal/) that is currently operating to capture, analyze, and diss...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as eviden...