The growing availability of online text has lead to an increase in the use of automatic knowledge acquisition approaches from textual data, as in Information Extraction (IE). Some ...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Abstract. Nowadays one of the most common formats for storing information is XML. The size of XML documents can be rather large, and they may contain redundant attributes which can...
In data mining applications, highly sized contexts are handled what usually results in a considerably large set of frequent itemsets, even for high values of the minimum support t...
Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu N...
Background: Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as U...
Robin P. Smith, William J. Buchser, Marcus B. Lemm...