The retrieval of videos of interest from large video collections is a main open problem which calls for the definition of new video content characterization techniques in term of ...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because eac...
It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple docum...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...