A lot of research has been conducted by the database community on methods and techniques for efficient XPath processing, with great success. Despite the progress made, significant...
Previous information extraction (IE) systems are typically organized as a pipeline architecture of separated stages which make independent local decisions. When the data grows bey...
Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, Heng J...
Several problems in text categorization are too hard to be solved by standard bag-of-words representations. Work in kernel-based learning has approached this problem by (i) consid...
This paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global in...
Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the dat...