Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Traditional spectral classification has been proved to be effective in dealing with both labeled and unlabeled data when these data are from the same domain. In many real world ap...
Analyzing data on-board a spacecraft as it is collected enables several advanced spacecraft capabilities, such as prioritizing observations to make the best use of limited bandwid...
Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (call...
Xuanhui Wang, ChengXiang Zhai, Xiao Hu, Richard Sp...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...