Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis a

13 years 4 months ago

Download langtech.jrc.ec.europa.eu

We are proposing a simple, but efficient basic approach for a number of multilingual and cross-lingual language technology applications that are not limited to the usual two or three languages, but that can be applied with relatively little effort to larger sets of languages. The approach consists of using existing multilingual linguistic resources such as thesauri, nomenclatures and gazetteers, as well as exploiting the existence of additional more or less language-independent text items such as dates, currency expressions, numbers, names and cognates. Mapping texts onto the multilingual resources and identifying word token links between texts in different languages are basic ingredients for applications such as cross-lingual document similarity calculation, multilingual clustering and categorisation, cross-lingual document retrieval, and tools to provide cross-lingual information access.

Ralf Steinberger, Bruno Pouliquen, Camelia Ignat

Real-time Traffic

CORR 2006 | Cross-lingual Document | Education | Multilingual | Multilingual Linguistic Resources |

claim paper

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	CORR
Authors	Ralf Steinberger, Bruno Pouliquen, Camelia Ignat

Comments (0)

Sciweavers

Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis a

CORR 2006 | Cross-lingual Document | Education | Multilingual | Multilingual Linguistic Resources |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers