Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and reca...
Mirel Cosulschi, Adrian Giurca, Bogdan Udrescu, Ni...
In order to enable scalable querying of graph databases, intelligent selection of subgraphs to index is essential. An improved index can reduce response times for graph queries si...
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics ...
ChengXiang Zhai, William W. Cohen, John D. Laffert...
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector spa...
Topics in prior-art patent search are typically full patent applications and relevant items are patents often taken from sources in different languages. Cross language patent retr...