This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
Abstract. With the rapid development of on-line information services, information technologies for on-line information processing have been receiving much attention recently. Clust...
Current data warehouse and OLAP technologies can be applied to analyze the structured data that companies store in their databases. The circumstances that describe the context ass...
Collinear arrangement of objects (such as, text elements or continuous lines) is integral part of any office document image, whether structured or unstructured. The ability to ana...
The re-use of spoken word audio collections maintained by audiovisual archives is severely hindered by their generally limited access. The CHoral project, which is part of the CAT...
Willemijn Heeren, Franciska de Jong, Laurens van d...