Sciweavers

1013 search results - page 36 / 203
» Document Re-ranking by Generality in Bio-medical Information...
Sort
View
CIKM
2008
Springer
14 years 12 months ago
Winnowing-based text clustering
We present an approach to document clustering based on winnowing fingerprints that achieved good values of effectiveness with considerable save in memory space and computation tim...
Javier Parapar, Alvaro Barreiro
ERCIMDL
2009
Springer
117views Education» more  ERCIMDL 2009»
15 years 4 months ago
A Visualization Tool of Probabilistic Models for Information Access Components
An effective graphic interface is a key tool to improve the fruition of the results retrieved by an Information Retrieval (IR) system. In this work, we describe a two-dimensional...
Lorenzo De Stefani, Giorgio Maria Di Nunzio, Giorg...
JCDL
2005
ACM
100views Education» more  JCDL 2005»
15 years 3 months ago
Automatic extraction of titles from general documents using machine learning
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
CIKM
2003
Springer
15 years 3 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
WWW
2005
ACM
15 years 3 months ago
Finding the boundaries of information resources on the web
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov