Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
We consider the problem of characterisation of sequences of heterogeneous symbolic data that arise from a common underlying temporal pattern. The data, which are subject to impreci...
Across a wide variety of fields, huge datasets are being collected and accumulated at a dramatical pace. The datasets addressed by individual applications are very often heteroge...
Background: One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential varia...
Yoshimasa Tsuruoka, John McNaught, Sophia Ananiado...
Feature Space Conversion for classifiers is the process by which the data that is to be fed into the classifier is transformed from one form to another. The motivation behind doin...