In current phrase-based Statistical Machine Translation systems, more training data is generally better than less. However, a larger data set eventually introduces a larger model ...
This paper focuses on automatically improving the readability of documents. We explore mechanisms relating to content control that could be used (i) by authors to improve the qual...
With the information overload in the life sciences there is an increasing need for annotated corpora, particularly with biological and biomedical entities, which is the driving fo...
The paper presents an integrated set-theoretic data model that offers a framework for defining a unified schema for any database environment. We utilise the concepts ”entity...
Emmanuel J. Yannakoudakis, Panagiotis Andrikopoulo...
Medical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin—in which case they are called neocla...