I/O-Conscious Data Preparation for Large-Scale Web Search Engines

13 years 6 months ago
I/O-Conscious Data Preparation for Large-Scale Web Search Engines
Given that commercial search engines cover billions of web pages, efficiently managing the corresponding volumes of disk-resident data needed to answer user queries quickly is a formidable data manipulation challenge. We present a general technique for efficiently carrying out large sets of simple transformation or querying operations over external-memory data tables. It greatly reduces the number of performed disk accesses and seeks by maximizing the temporal locality of data access and organizing most of the necessary disk accesses into long sequential reads or writes of data that is reused many times while in memory. This technique is based on our experience from building a functionally complete and fully operational web search engine called Yuntis. As such, it is in particular well suited for most data manipulation tasks in a modern web search engine and is employed throughout Yuntis. The key idea of this technique is coordinated partitioning of related data tables and correspondi...
Maxim Lifantsev, Tzi-cker Chiueh
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where VLDB
Authors Maxim Lifantsev, Tzi-cker Chiueh
Comments (0)