Sciweavers

298 search results - page 27 / 60
» An information-theoretic measure for document similarity
Sort
View
DKE
2008
96views more  DKE 2008»
14 years 12 months ago
Fragment-based approximate retrieval in highly heterogeneous XML collections
Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents ...
Ismael Sanz, Marco Mesiti, Giovanna Guerrini, Rafa...
RIAO
2004
15 years 1 months ago
Multilingual document clusters discovery
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on th...
Benoît Mathieu, Romaric Besançon, Chr...
CIVR
2005
Springer
205views Image Analysis» more  CIVR 2005»
15 years 5 months ago
Automatic Image Semantic Annotation Based on Image-Keyword Document Model
Abstract. This paper presents a novel method of automatic image semantic annotation. Our approach is based on the Image-Keyword Document Model (IKDM) with image features discretiza...
Xiangdong Zhou, Lian Chen, Jianye Ye, Qi Zhang, Ba...
CIKM
2008
Springer
15 years 1 months ago
Achieving both high precision and high recall in near-duplicate detection
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Lian'en Huang, Lei Wang, Xiaoming Li
CLEF
2011
Springer
13 years 11 months ago
A Language-Independent Approach to Identify the Named Entities in Under-Resourced Languages and Clustering Multilingual Document
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
N. Kiran Kumar, G. S. K. Santosh, Vasudeva Varma