Sciweavers

TREC
2004
13 years 6 months ago
Language Models for Searching in Web Corpora
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Jaap Kamps, Gilad Mishne, Maarten de Rijke
SSWMC
2004
13 years 6 months ago
Multimedia document authentication using on-line signatures as watermarks
Authentication of digital documents is an important concern as digital documents are replacing the traditional paper-based documents for official and legal purposes. This is espec...
Anoop M. Namboodiri, Anil K. Jain
ISTA
2004
13 years 6 months ago
TransM: A Structured Document Transformation Model
: We present in this paper a transformation model for structured documents. TransM is a new model that deals with specified documents, where the structure conforms to a predefined ...
Nouhad Amaneddine, Jean Paul Bahsoun, Jean-Paul Bo...
CRIWG
2001
13 years 6 months ago
Yaka: Document Notification and Delivery Across Heterogeneous Document Repositories
Nowadays people have to deal with an increasing amount of information contained in electronic documents available from numerous heterogeneous, widely distributed sources. Keeping ...
Damián Arregui, François Pacull, Jut...
IADIS
2004
13 years 6 months ago
A conceptual modeling of multimedia documents
Our research works are interested in the identification and the representation of the semantic structures of multimedia documents. The semantic structure of a multimedia document ...
Mohamed Mbarki, Chantal Soulé-Dupuy
EMNLP
2006
13 years 6 months ago
Entity Annotation based on Inverse Index Operations
Entity annotation involves attaching a label such as `name' or `organization' to a sequence of tokens in a document. All the current rule-based and machine learningbased...
Ganesh Ramakrishnan, Sreeram Balakrishnan, Sachind...
BDA
2006
13 years 6 months ago
Integrating Correction into Incremental Validation
Many data on the Web are XML documents. An XML document is an unranked labelled tree. A schema for XML documents (for instance a DTD) is the specification of their internal structu...
Béatrice Bouchou, Ahmed Cheriat, Mirian Hal...
AAAI
2006
13 years 6 months ago
Script and Language Identification in Degraded and Distorted Document Images
This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages th...
Shijian Lu, Chew Lim Tan
RIVF
2007
13 years 6 months ago
Disambiguation of People in Web Search Using a Knowledge Base
— Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different...
Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, J...
RIAO
2007
13 years 6 months ago
Comprehensible and Accurate Cluster Labels in Text Clustering
The purpose of text clustering in information retrieval is to discover groups of semantically related documents. Accurate and comprehensible cluster descriptions (labels) let the ...
Jerzy Stefanowski, Dawid Weiss