Standard Machine Learning approaches to text classification use the bag-of-words representation of documents to deceive the classification target function. Typical linguistic stru...
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it r...
The paper provides an overview of the Polish Speech Database for taking dictation of legal texts, created for the purpose of LVCSR system for Polish. It presents background inform...
Grazyna Demenko, Stefan Grocholewski, Katarzyna Kl...
Within the industrial context of the information society, technical translation represents a considerable commercial stake. In the light of this, machine translation is considered...
We present a model that improves entity entity link modeling in a mixed membership stochastic block model, by jointly modeling links with text about the entities that are linked i...