Categories

Data Warehousing: Tutorial 4 [Facts and Dimensions]

0

As we studied in data warehouse tutorial-2 that each organization has a number of subject areas depending upon its type. And each subject area contains a number of tables that contain the data.

In this tutorial, we are going to learn about these tables in the data warehouse viz. facts and dimensions. There are two kinds of tables; one is called fact table and another one is called dimension table.

Fact Table

A fact table is a primary table used in dimensional modelling where numerical measurements are stored. The measured data comes from business processing of a single data mart e.g. sales data mart.

Example of fact table,

sales fact tableHere is an example of sales fact table. It has “Quantity Sold” and “Sales Amount” as the quantitative data which are measurable and used for calculation purpose. Thus, these measurable quantities are called facts. The table containing facts is a fact table. It contains Date Key and Product Key as two Foreign keys. The fact table is related to dimension table using these foreign keys only. This is known as referential integrity. These keys are also known as composite keys or concatenate keys. Every fact table has two or more than two foreign keys. These foreign keys can also be used to determine whether the table is a fact or dimension as the one that contains composite keys is a fact table. Another thing is that the table that has many to many relationships with another table is a fact table.

These keys are also present in dimension tables where these are called primary keys of dimension tables. These are shown below in product dimension and date dimension table.

primary keys

The measurement of intersection of all the dimensions is referred to as grain. The list of all the dimensions is called the grain of a fact table. Each of the rows of fact table represents a measurement. Facts can be additive, semi-additive and non-additive. The semi-additive facts are those that can be added only in some dimensions while the non-additive facts are those that cannot be added.

Dimension Table

Other than facts, dimensions are textual descriptions of a business and not numerical measurements. Dimension tables have columns as attributes that are described as rows. A dimension can have 50-100 attributes. It is defined by a single primary key which is denoted by PK. As said earlier, primary key can have referential integrity with a fact table.

product dimension

When querying, attributes are defined using “by” word. For example, “sales by week” is a search query which is used to determine the sales on weekly basis. Thus, here “week” is a dimensional attribute.

The dimensions help in understanding the data in data warehouse and make it reusable. Quality and values of attributes can lead to a better data warehouse. Description of an attribute can vary from 10-15 characters to 30-50 characters. The description can be the name of the brand, category and size etc.

It is often not clear whether a numeric data field is a fact or a dimension. Generally, a numeric data field which is constant in nature and is not involved in calculations and measurements is considered to be a dimension while a data field which is involved in measurements and calculations is a fact. It depends on the designer for deciding the facts and dimensions.

Codes are usually not listed while defining attributes to avoid inconsistency in data warehouse. Sometimes, codes can be of significance to the business and should be used as explicit dimension attributes. In codes, the first two digits are referred to as line of business and the next two digits depict the global region.

Dimension tables have hierarchical relationships. For example, in a product table, product can be categorized on the basis of brand and category. This hierarchical information is stored redundantly for ease associated while using it and for better query performance. A separate table is created for looking up the brand and this process is called as snowflake.

Dimension tables are usually denormalized as these are small in size. [Normalization would be discussed in the coming tutorials] But normalizing or snowflaking in a dimension table can help in improving the storage efficiency. It does not affect the overall size of database.

Check out AccuWeb Hosting now if you want to explore shared hosting, reseller and VPS hosting options. 

Share.

Leave A Reply