A dimensional data warehouse is the design of a data warehouse made up of interconnected star schema or snowflake schema analysis areas. A dimensional data warehouse commonly consists of one or more data marts.
Since the mid-nineties the dimensional data warehouse has been the de facto standard for mid market data warehouses and are common as the presentation layer of enterprise data warehouses (multidimensional or OLAP cubes are similarly used as entry level data repositories, or as part of the presentation layer for mid market or enterprise data warehouses). Dr Ralph Kimball is often quoted when discussing the design of dimensional data warehouses and has written some very useful books including the Data Warehouse Toolkit and subsequent updates.
The dimensional data warehouse design has the following advantages:
-
- The dimensional data warehouse is modular in design and implementation, so that the data warehouse can be built in phases, starting small and growing to incorporate all of an organization’s data. This phased implementation reduces the risk that the overall data warehouse is rejected or fails, as the first phase can be delivered to users and provide business value relatively quickly which justifies further development.
-
- The dimensional data warehouse design is fast and efficient. Large numbers of transactions can be stored in an efficient format within the central fact table of a star schema, because only numbers are stored: measures (values) and numeric surrogate keys to rich (but smaller) dimensional or descriptive data. When a fact table contains many millions of rows, keeping the size of each row to a minimum reduces the space required and also improves query performance. When the volumes are low, additional information can be kept in the fact table aiding legibility. Verbose and descriptive information is stored in dimension tables which have far fewer rows than a fact table, or, if required, in snowflake tables off dimension tables.
-
- The dimensional data warehouse is extensible. Almost all dimensional data warehouses have fact tables that contain transactional or snapshot information, potentially many millions of rows of data fed from a transactional system that is very functionally focused. To improve performance aggregate tables can be built, stripping out detail, but improving query performance dramatically. For example rolling data up to monthly level from daily level can dramatically increase performance and usability. Additionally data from these transactional systems can be combined in Key Performance Indicator (KPI) fact tables.
-
- The dimensional data warehouse has a consistent design, so once a user is familiar with the structure of one area they should be able to navigate other areas without needing any additional technical training.
-
- The dimensional data warehouse is understood by the vast majority of commercial off-the-shelf query and reporting tools.
-
- The dimensional data warehouse can be built from a normalized data model (dependent data marts) or directly from source systems.
DIMENSIONAL DATA WAREHOUSE MODELING
Dimensional data warehouse Modeling using WhereScape RED is a lightweight process. WhereScape RED allows very tight and fast iterations, so that once a high level design is built from the user requirements (key facts and dimensions) a prototype can be shown to users within a matter of hours or days. This reduces the risk that the design is wrong, determines whether the data is available from source systems and also encourages users to “buy-in” to the data warehouse Modeling process.
Data warehouse Modeling starts with a series of interviews or workshops with users to find out what information they need to do their job. These requirements drive the data warehouse design. A dimensional design builds star schemas consisting of central fact tables containing transactions or snapshots and dimension tables which provide context to the transactions. A common technique in data warehouse Modeling is to use second tier fact tables which are fed from the transactional fact tables and are geared towards specific business requirements.
Fact tables are analyzed by users through their associated dimensions. Dimensions provide contextual information regarding the transactions or the subject of the fact table. In dimensional data warehouse modeling the dimensions are identified by how the users need to analyze the data in the fact tables – these are often the what, where, when, who and how. Dimensions often contain hierarchies of data as well as dimensional attributes. For example a “where” dimension of store might contain a geographic dimension, as well as some attributes such as manager name and store size. These attributes might be used to analyze the data, or simply be required to appear in a report.
Snowflake schema are a variation on start schema, and are utilized for very large dimensions and when using some front end tools such as MicroStrategy.