This paper reports the estimated number of spam blogs in order to assess their current state in the blogosphere. To extract spam blogs, I developed a traversal method among co-cit...
P2P systems are increasingly used to discover and share various data between users. The performance of a P2P based information retrieval system is determined by the efficiency of...
Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clustering...
Tuomo Korenius, Jorma Laurikkala, Martti Juhola, K...
User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language que...
Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists in the corpus vocabular...