Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web...
In statistical modelling, an investigator must often choose a suitable model among a collection of viable candidates. There is no consensus in the research community on how such a...
Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, ...
We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acous...
The research described in this paper is concerned with the application of information retrieval to software maintenance, and in particular to the problem of recovering traceabilit...