Sciweavers

88 search results - page 3 / 18
» Finding similar files in large document repositories
Sort
View
IWPC
2007
IEEE
13 years 11 months ago
Mining Software Repositories for Traceability Links
An approach to recover/discover traceability links between software artifacts via the examination of a software system’s version history is presented. A heuristic-based approach...
Huzefa H. Kagdi, Jonathan I. Maletic, Bonita Shari...
ITCC
2003
IEEE
13 years 10 months ago
A Method for Calculating Term Similarity on Large Document Collections
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Wolfgang W. Bein, Jeffrey S. Coombs, Kazem Taghva
MSR
2010
ACM
13 years 10 months ago
Finding file clones in FreeBSD Ports Collection
Abstract—In Open Source System (OSS) development, software components are often imported and reused; for this reason we might expect that files are copied in multiple projects (...
Yusuke Sasaki, Tetsuo Yamamoto, Yasuhiro Hayase, K...
ISMIS
2000
Springer
13 years 8 months ago
Automatic Semantic Header Generator
Indexing file systems is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many differen...
Bipin C. Desai, Sami S. Haddad, Abdelbaset Ali
DCC
2004
IEEE
14 years 4 months ago
An Approximation to the Greedy Algorithm for Differential Compression of Very Large Files
We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. Differential compression refers to encodi...
Ramesh C. Agarwal, Suchitra Amalapurapu, Shaili Ja...