Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be m...
The use of unlabeled data to aid classification is important as labeled data is often available in limited quantity. Instead of utilizing training samples directly into semi-super...
Web spam, which refers to any deliberate actions bringing to selected web pages an unjustifiable favorable relevance or importance, is one of the major obstacles for high quality ...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
This paper describes a system to support humanities scholars in their interpretation of literary work. It presents a user interface and web architecture that integrates text minin...
Catherine Plaisant, James Rose, Bei Yu, Loretta Au...