In this data warehousing tutorial, architectural environment, monitoring of data warehouse, structure of data warehouse and granularity of data warehouse are discussed.
Types of Data
There are two types of data in architectural environment viz. primitive data and derived data. Primitive data is an operational data that contains detailed data required to run daily operations of a company. Derived data is DSS data which is a summarized form of data required to meet the needs of management of a company.
Granularity is defined as the level of detail or the summarization of units of data in the data warehouse. Data in the data warehouse is granulated and can be examined in different ways by different people. When there is more detail of data, granularity is decreased. The level of granularity is high when there is low detail of data. For example, one transaction has full detail of data, which means it has low level of granularity.
Another transaction has summary of all the transactions, which means there is high level of granularity. Granularity affects the volume of data and the type of query answered in the data warehouse. Data that enters into the data warehouse from operational data is at high granularity and has low detail. An example of data entering into the data warehouse at low granularity is Web log data.
There are a number of benefits of granularity viz. it makes available the reusable granular data in the data warehouse, it has the ability to reconcile data, it gives flexibility, it contains history of activities and events and unknown requirements in the future can be accommodated using this.
Two levels of granularity are required viz. lightly summarized data and true archival data.
There are 4 levels of architecture viz. operational, data warehouse, departmental and individual. Operational data is a data of daily operations. It can be updated according to the changes in the business. It is basically an application oriented primitive data. Operational data is not in integrated form. The data integration is done using ETL software. Operational data is integrated to get data warehouse which is also known as atomic data. It is the most granular data.
Data warehouse is an integrated data which is historical primitive data. It cannot be updated like operational data. It also contains some derived data too. Departmental data is exclusively a derived data. It is also termed as data mart. It is based on the concept of DBMS or OLAP and is used by the end user. Individual data is a temporary data using which heuristic analysis is done. Here, EIS (Executive Information Systems processing) is done.
Structure of Data Warehouse
There are different levels of data in data warehouse viz. older level of detail, current level of detail, level of lightly summarized data and level of highly summarized data. After aging of data, current level of detail gets converted to older level of detail. After summarization, current level of detail gets converted to lightly summarized which then is converted to highly summarized data.
Data warehouse is subject-oriented. Each organization has its own subject areas viz. customer, product, transaction etc. Each subject area contains series of related tables in data warehouse. And all the tables are related to each other using common key.
For example, the subject area customer has a number of tables related to customer detail, customer’s transaction detail etc. which are connected to each other with a common data, customer ID. Each table has a customer Id which is common to all tables. The subject area may contain data on different media viz magnetic tape, DASD etc. Each table of the subject area contains an element of time (It is not shown in the diagram but it’s there in every table).
Data warehouse is constructed in two ways; day 1-day n phenomenon and the one that requires building of data mart first. Creating data warehouse by building data mart first leads to wastage of data. The day 1-day n phenomenon says that data in the data warehouse is built up day by day. Every day the data is loaded in the data warehouse and hence it leads to a complex structure of data warehouse.
Monitoring of Data Warehouse
As the nature of data warehouse is constant growth, it is important to monitor the data warehouse which requires its maintenance as well. It includes measuring the response time too. In monitoring, one requires to monitor the data in data warehouse as well as usage of that data.
There are a number of benefits of monitoring the data warehouse viz. to determine the rate of growth, identification of data, calculation of response time, users of data warehouse, level of usage of data warehouse, when and how much data warehouse is used.
[Response time is determined when the user sends the request, the request is serviced and is returned to the user.]
Monitoring is done at two levels: end-user level and server level. In monitoring at end-user level, each terminal which is monitored requires its own administration. Since one DSS network has 10000 terminals, administration becomes cumbersome. In comparison to this, monitoring at server level involves easy administration. The benefit of data warehouse monitoring is that one can compare today’s results with average results.
Continue reading the next tutorial for comparison of data warehouse environment with operational environment.