The amount of information in medical publications continues to increase at a tremendous rate. Systematic reviews help to process this growing body of information. They are fundame...
Cross-language text classification (CLTC) aims to take advantage of existing training data from one language to construct a classifier for another language. In addition to the expe...
Spreadsheets applications allow data to be stored with low development overheads, but also with low data quality. Reporting on data from such sources is difficult using traditiona...
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional featurevector-based learning methods, one could treat...
Sofus A. Macskassy, Haym Hirsh, Arunava Banerjee, ...
We describe an efficient technique to weigh word-based features in binary classification tasks and show that it significantly improves classification accuracy on a range of proble...
Justin Martineau, Tim Finin, Anupam Joshi, Shamit ...