This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wis...
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representat...
Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than mu...
—Many companies now routinely run massive data analysis jobs – expressed in some scripting language – on large clusters of low-end servers. Many analysis scripts are complex ...