We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible le...
We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck ...
This paper describes how heterogeneous data sources captured in the SignCom project may be used for the analysis and synthesis of French Sign Language (LSF) utterances. The captur...
Biodiversity informatics (BDI) information is both highly localized and highly distributed. The temporal and spatial contexts of data collection events are generally of primary imp...
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ...