Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregati...
Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and docu...
Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the correspo...
We describe the design and use of a personal digital library system, UpLib. The system consists of a full-text indexed repository accessed through an active agent via a Web interf...
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...