Many applications in text processing require significant human effort for either labeling large document collections (when learning statistical models) or extrapolating rules from...
In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence betwee...
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...
We have developed a technique to characterize software developers' styles using a set of source code metrics. This style fingerprint can be used to identify the likely author...
Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling traini...