Formalizing ETL Jobs for Incremental Loading of Data Warehouses

9 years 6 months ago
Formalizing ETL Jobs for Incremental Loading of Data Warehouses
Abstract: Extract-transform-load (ETL) tools are primarily designed for data warehouse loading, i.e. to perform physical data integration. When the operational data sources happen to change, the data warehouse gets stale. To ensure data timeliness, the data warehouse is refreshed on a periodical basis. The naive approach of simply reloading the data warehouse is obviously inefficient. Typically, only a small fraction of source data is changed during loading cycles. It is therefore desirable to capture these changes at the operational data sources and refresh the data warehouse incrementally. This approach is known as incremental loading. Dedicated ETL jobs are required to perform incremental loading. We are not aware of any ETL tool that helps to automate this task. In fact, incremental load jobs are handcrafted by ETL programmers so far. The development is thus costly and error-prone. In this paper we present an approach to the automated derivation of incremental load jobs based on e...
Thomas Jörg, Stefan Deßloch
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where BTW
Authors Thomas Jörg, Stefan Deßloch
Comments (0)