Semantic Web data exhibits very skewed frequency distributions among terms. Efficient large-scale distributed reasoning methods should maintain load-balance in the face of such hi...
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in ...
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop des...
Dhruba Borthakur, Jonathan Gray, Joydeep Sen Sarma...
Managing time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge, which is difficult to support within database ...
Martin J. O'Connor, Ravi D. Shankar, David B. Parr...