One Size Fits All? A Simple Technique to Perform Several NLP Tasks

10 years 6 months ago
One Size Fits All? A Simple Technique to Perform Several NLP Tasks
Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of languages [5]. All these techniques share some common aspects such as: (1) documents are mapped to a vector space where n-grams are used as coordinates and their relative frequencies as vector weights, (2) many of them compute a context which plays a role similar to stop-word lists, and (3) cosine distance is commonly used for document-to-document and query-to-document comparisons. blindLight is a new approach related to these classical n-gram techniques although it introduces two major differences: (1) Relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. This new approa...
Daniel Gayo-Avello, Darío Álvarez Gu
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where TAL
Authors Daniel Gayo-Avello, Darío Álvarez Gutiérrez, José Gayo-Avello
Comments (0)