The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection s...
Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a nee...
Nirmalie Wiratunga, Robert Lothian, Stewart Massie
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. Howeve...
Many algorithms extract terms from text together with some kind of taxonomic classification (is-a) link. However, the general approaches used today, and specifically the methods o...