Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. ...
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gr...
We present an approach for automatic detection of topic change. Our approach is based on the analysis of statistical features of topics in time-sliced corpora and their dynamics ov...
Gaussian mixture model - universal background model (GMMUBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vect...
Tomi Kinnunen, Juhani Saastamoinen, Ville Hautam&a...
Data-driven Spoken Language Understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been su...