Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. In this...
One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we ...
Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakehold...
In this paper, we present a framework for mining diverging patterns, a new type of contrast patterns whose frequency changes significantly differently in two data sets, e.g., it c...
The identification and processing of similarities in the data play a key role in multiple application scenarios. Several types of similarity-aware operations have been studied in ...