Internet content today is about 80% text-based. No matter static or dynamic, the information is encoded and presented as multilingual, unstructured natural language text pages. As ...
Pavlin Dobrev, Albena Strupchanska, Galia Angelova
We introduce a new set of tools for working with web-scale N-gram data. These tools lower the barrier for working with web-scale text, and create a new platform for acquiring larg...
Dekang Lin, Kenneth Ward Church, Heng Ji, Satoshi ...
In this paper we propose computeraided summarisation (CAS) as an alternative approach to automatic summarisation, and present an ongoing project which aims to develop a CAS system...
This paper presents SFST-PL, a programming language for finite state transducers which is based on extended regular expressions with variables. The programming language is both si...
This is a paper supporting the demonstration of the LX-Center at ACL-IJCNLP-09. LX-Center is a web center of online linguistic services aimed at both demonstrating a range of lang...