Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accurately predict the usefulness of each...
Weiyi Meng, King-Lup Liu, Clement T. Yu, Xiaodong ...
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Web cache technologies have been developed as an extension of CPU cache, by modifying LRU (Least Recently Used) algorithms. Actually in web cache systems, we can use disks and ter...
We examine clarification dialogue, a mechanism for refining user questions with follow-up questions, in the context of open domain Question Answering systems. We develop an algori...
Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clustering...
Tuomo Korenius, Jorma Laurikkala, Martti Juhola, K...