In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books,...
We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semiCRF on an input sequence x outputs a "...
We present a divide-and-conquer strategy based on finite state technology for shallow parsing of realworld German texts. In a first phase only the topological structure of a sente...
We present sppc, a high-performance system for intelligent text extraction and navigation from German free text documents. sppc consists of a set of domainindependent shallow core...
We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the ...