This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in inform...
Walid Magdy, Jinming Min, Johannes Leveling, Garet...
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved...
Mixtures of probabilistic principal component analyzers model high-dimensional nonlinear data by combining local linear models. Each mixture component is specifically designed to...
We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between the...
Abstract. Large product lines have complex build systems, which obscure mapping of features to code. We extract this mapping out of the build systems of two operating systems kerne...
Thorsten Berger, Steven She, Rafael Lotufo, Krzysz...