Sciweavers

ERCIMDL
2005
Springer

A Comparison of On-Line Computer Science Citation Databases

13 years 10 months ago
A Comparison of On-Line Computer Science Citation Databases
This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers. Despite their difference, both databases exhibit similar and significan...
Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G.
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ERCIMDL
Authors Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G. Councill, C. Lee Giles
Comments (0)