According to the logical model of Information Retrieval (IR), the task of IR can be described as the extraction, from a given document base, of those documents d that, given a que...
Carlo Meghini, Fabrizio Sebastiani, Umberto Stracc...
Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clustering...
Tuomo Korenius, Jorma Laurikkala, Martti Juhola, K...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
We propose a new clustering algorithm, called SyMP, which is based on synchronization of pulse-coupled oscillators. SyMP represents each data point by an Integrate-and-Fire oscill...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...