Due to the structural heterogeneity of XML, queries are often interpreted approximately. This is achieved by relaxing the query and ranking the results based on their relevance to ...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
Typical approaches to XML authoring view a XML document as a mixture of structure (the tags) and surface (text between the tags). We advocate a radical approach where the surface ...
Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid...