Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not ca...
Previous studies of Internet traffic have shown that a very small percentage of flows consume most of the network bandwidth. It is important to understand the characteristics of s...
In bibliographies like DBLP and Citeseer, there are three kinds of entity-name problems that need to be solved. First, multiple entities share one name, which is called the name sh...
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically...
Eleazar Eskin, William Stafford Noble, Yoram Singe...
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while...