We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...
In the KL divergence framework, the extended language modeling approach has a critical problem estimating a query model, which is the probabilistic model that encodes user’s inf...
Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic...
A recommender system has an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual’s capability to survey. Music recommendation...
Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understandin...