Sciweavers

938 search results - page 97 / 188
» Space-Efficient Algorithms for Document Retrieval
Sort
View
SIGMOD
2008
ACM
123views Database» more  SIGMOD 2008»
16 years 5 months ago
Query-based partitioning of documents and indexes for information lifecycle management
Regulations require businesses to archive many electronic documents for extended periods of time. Given the sheer volume of documents and the response time requirements, documents...
Soumyadeb Mitra, Marianne Winslett, Windsor W. Hsu
JCDL
2005
ACM
100views Education» more  JCDL 2005»
15 years 10 months ago
What's there and what's not?: focused crawling for missing documents in digital libraries
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors’ s...
Ziming Zhuang, Rohit Wagle, C. Lee Giles
ICDE
2002
IEEE
181views Database» more  ICDE 2002»
15 years 10 months ago
YFilter: Efficient and Scalable Filtering of XML Documents
Soon, much of the data exchanged over the Internet will be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called ...
Yanlei Diao, Peter M. Fischer, Michael J. Franklin...
CIKM
2008
Springer
15 years 7 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
SIGIR
2005
ACM
15 years 10 months ago
Relation between PLSA and NMF and implications
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as docum...
Éric Gaussier, Cyril Goutte