Automated dimensionality reduction of data warehouses

13 years 10 months ago

Download sunsite.informatik.rwth-aachen.de

A data warehouse is designed to consolidate and maintain all attributes that are relevant for the analysis processes. Due to the rapid increase in the size of the modern operational systems, it becomes neither practical, nor necessary to load and maintain in the data warehouse every operational attribute. This paper presents a novel methodology for automated selection of the most relevant independent attributes in a data warehouse. The method is based on the information-theoretic approach to knowledge discovery in databases. Attributes are selected by a stepwise forward procedure aimed at minimizing the uncertainty in the values of key performance indicators (KPI's). Each selected attribute is assigned a score, expressing its degree of relevance. Using the method does not require any prior expertise in the domain of the data and it can be equally applied to nominal and ordinal attributes. An attribute will be included in a data warehouse schema, if it is found as relevant to at l...

Mark Last, Oded Maimon

Real-time Traffic