Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summariza...
Genre or style analysis can be used to improve results achieved using standard IR techniques. A genre class is a group of documents that are written in a similar style. Genre clas...
Abstract. We applied different clustering algorithms to the task of clustering multi-word terms in order to reflect a humanly built ontology. Clustering was done without the usual ...
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not en...
Language identification is the task of identifying the language a given document is written in. This paper describes a detailed examination of what models perform best under diffe...