Abstract. Today's graphical modelling languages, despite using symbols and connections, represent large model parts as structured text. We benefit from sophistic text editors,...
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in...
An increasing number of comfortable publishing systems nowadays leads to documents containing more than just textual information. Graphics and images are combined with text and of...
Measuring the similarity between two texts is a fundamental problem in many NLP and IR applications. Among the existing approaches, the cosine measure of the term vectors represen...
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whi...