The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective fo...
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. ...
People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set o...
In the TREC 2008, the team from the State University of New York at Buffalo participated in the Legal track and the Blog track. For the Legal track, we worked on the interactive s...
Jianqiang Wang, Ying Sun, Omar Mukhtar, Rohini K. ...