Sciweavers

CIS
2005
Springer
13 years 10 months ago
Concept Chain Based Text Clustering
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
Shaoxu Song, Jian Zhang, Chunping Li
CIKM
2005
Springer
13 years 10 months ago
A function-based access control model for XML databases
XML documents are frequently used in applications such as business transactions and medical records involving sensitive information. Typically, parts of documents should be visibl...
Naizhen Qi, Michiharu Kudo, Jussi Myllymaki, Hamid...
BTW
2005
Springer
91views Database» more  BTW 2005»
13 years 10 months ago
Element Relationship: Exploiting Inline Markup for Better XML Retrieval
: With the increasing popularity of semi-structured documents (particularly in the form of XML) for knowledge management, it is important to create tools that use the additional in...
Philipp Dopichaj
UIST
2005
ACM
13 years 10 months ago
PapierCraft: a command system for interactive paper
Knowledge workers use paper extensively for document reviewing and note-taking due to its versatility and simplicity of use. As users annotate printed documents and gather notes, ...
Chunyuan Liao, François Guimbretière...
SIGIR
2005
ACM
13 years 10 months ago
Modeling task-genre relationships for IR in the workplace
Context influences the search process, but to date research has not definitively identified which aspects of context are the most influential for information retrieval, and thus a...
Luanne Freund, Elaine G. Toms, Charles L. A. Clark...
ICDAR
2005
IEEE
13 years 10 months ago
Multi-scale Techniques for Document Page Segmentation
Page segmentation algorithms found in published literatures often rely on some predetermined parameters such as general font sizes, distances between text lines and document scan ...
Zhixin Shi, Venu Govindaraju
ICDAR
2005
IEEE
13 years 10 months ago
A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques
We describe a new corpus collected for comparative evaluation of OCR-software and postcorrection techniques. The corpus is freely available for academic groups and use. The major ...
Stoyan Mihov, Klaus U. Schulz, Christoph Ringlstet...
ICDAR
2005
IEEE
13 years 10 months ago
Document Ranking by Layout Relevance
This paper describes the development of a new document ranking system based on layout similarity. The user has a need represented by a set of ”wanted” documents, and the syste...
May Huang, Daniel DeMenthon, David S. Doermann, Ly...
ICDAR
2005
IEEE
13 years 10 months ago
Document Understanding System Using Stochastic Context-Free Grammars
We present a document understanding system in which the arrangement of lines of text and block separators within a document are modeled by stochastic context free grammars. A gram...
John C. Handley, Anoop M. Namboodiri, Richard Zani...
ICCV
2005
IEEE
13 years 10 months ago
Learning Non-Generative Grammatical Models for Document Analysis
— We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a glo...
Michael Shilman, Percy Liang, Paul A. Viola