Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on e...
We present a method of searching text collections that takes advantage of hierarchrical information within documents and integrates searches of structured and unstructured data. W...
M. Catherine McCabe, Jinho Lee, Abdur Chowdhury, D...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper sh...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
We describe the integration of a structuredtext retrieval system (TextMachine) into an object-oriented database system (OpenODB). We use the external function capability of the da...