The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured da...
Sudarshan S. Chawathe, Hector Garcia-Molina, Joach...
Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training storie...
Leah S. Larkey, Fangfang Feng, Margaret E. Connell...
The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessib...
This article is motivated by the importance of building web data mashups. Building on the remarkable success of Web 2.0 mashups, and specially Yahoo Pipes, we generalize the idea ...
We propose new methods to exploit contemporaneous text, such as on-line news articles, to improve language models for automatic speech recognition and other natural language proce...