In this paper, we describe a method for automatic acquisition of script knowledge from a Japanese text collection. Script knowledge represents a typical sequence of actions that o...
Research on the discovery of terms from corpora has focused on word sequences whose recurrent occurrence in a corpus is indicative of their terminological status, and has not addr...
Disfluent speech adds to the difficulty of processing spoken language utterances. In this paper we concentrate on identifying one disfluency phenomenon: fragmented words. Our d...
We describe experiments with a Naive Bayes text classifier in the context of anti-spam E-mail filtering, using two different statistical event models: a multi-variate Bernoulli ...
One of the major challenges in TRECstyle question-answering (QA) is to overcome the mismatch in the lexical representations in the query space and document space. This is particul...