Background: Keyword searching through PubMed and other systems is the standard means of retrieving information from Medline. However, ad-hoc retrieval systems do not meet all of t...
Graham L. Poulter, Daniel L. Rubin, Russ B. Altman...
This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the "Manhattan di...
Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alo...
Wangzhong Lu, Jeannette C. M. Janssen, Evangelos E...
XML has become the most useful standard of data interchange in the web and e-business world and there is a large amount of information stored in this format. Nonetheless, a large ...
In this paper we present two experiments conducted for comparison of different language identification algorithms. Short words-, frequent words- and n-gram-based approaches are co...
Lena Grothe, Ernesto William De Luca, Andreas N&uu...