Sciweavers

88 search results - page 2 / 18
» Finding similar files in large document repositories
Sort
View
KDD
2007
ACM
186views Data Mining» more  KDD 2007»
14 years 5 months ago
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus
We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Deepavali Bhagwat, Kave Eshghi, Pankaj Mehra
KES
2006
Springer
13 years 4 months ago
Integrated Document Browsing and Data Acquisition for Building Large Ontologies
Named entities (e.g., "Kofi Annan", "Coca-Cola", "Second World War") are ubiquitous in web pages and other types of document and often provide a simpl...
Felix Weigel, Klaus U. Schulz, Levin Brunner, Edua...
DEXAW
2006
IEEE
111views Database» more  DEXAW 2006»
13 years 10 months ago
Finding Syntactic Similarities Between XML Documents
Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the correspo...
Davood Rafiei, Daniel L. Moise, Dabo Sun
CIDR
2007
141views Algorithms» more  CIDR 2007»
13 years 6 months ago
Fragmentation in Large Object Repositories
Fragmentation leads to unpredictable and degraded application performance. While these problems have been studied in detail for desktop filesystem workloads, this study examines n...
Russell Sears, Catharine van Ingen
MSR
2006
ACM
13 years 10 months ago
Mining sequences of changed-files from version histories
Modern source-control systems, such as Subversion, preserve change-sets of files as atomic commits. However, the specific ordering information in which files were changed is typic...
Huzefa H. Kagdi, Shehnaaz Yusuf, Jonathan I. Malet...