MSBI CONCEPTS: Data Warehouse Concepts

What is a Data Warehouse?

According to Inmon, famous author for several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process".

Example: In order to store data, over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems.

Example of Source Data

System Name	Attribute Name	Column Name	Datatype	Values
Source System 1	Customer Application Date	CUSTOMER_APPLICATION_DATE	NUMERIC(8,0)	11012005
Source System 2	Customer Application Date	CUST_APPLICATION_DATE	DATE	11012005
Source System 3	Application Date	APPLICATION_DATE	DATE	01NOV2005

In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source system to another. This inconsistency in data can be avoided by integrating the data into a data warehouse with good standards.

Example of Target Data(Data Warehouse)

Target System	Attribute Name	Column Name	Datatype	Values
Record #1	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005
Record #2	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005
Record #3	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005

In the above example of target data, attribute names, column names, and datatypes are consistent throughout the target system. This is how data from various source systems is integrated and accurately stored into the data warehouse.

See Figure 1.12 below for Data Warehouse Architecture Diagram.

Figure 1.12 : Data Warehouse Architecture

Data Warehouse & Data Mart
A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources.

In addition to a relational/multidimensional database, a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

There are three types of data warehouses:
1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise.
2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real entertprise data warehouse, data is refreshed in near real time and used for routine business activity.
3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business function.

Data warehouses and data marts are built on dimensional data modeling where fact tables are connected with dimension tables. This is most useful for users to access data since a database can be visualized as a cube of several dimensions. A data warehouse provides an opportunity for slicing and dicing that cube along each of its dimensions.

Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources.

Figure 1.12 : Data Warehouse and Datamarts

What is Star Schema?
Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema is slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema

· Identify a business process for analysis (like sales).

· Identify measures or facts (sales dollar).

· Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension).

· List the columns that describe each dimension.(region name, branch name, region name).

· Determine the lowest level of summary in a fact table(sales dollar).

Important aspects of Star Schema & Snow Flake Schema

· In a star schema every dimension will have a primary key.

· In a star schema, a dimension table will not have any parent table.

· Whereas in a snow flake schema, a dimension table will have one or more parent tables.

· Hierarchies for the dimensions are stored in the dimensional table itself in star schema.

· Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies.

Glossary:

Hierarchy
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.

Level
A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels.

Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Example of Star Schema: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.

· Sales Dollar value for a particular product

· Sales Dollar value for a product in a location

· Sales Dollar value for a product in a year within a location

· Sales Dollar value for a product in a year within a location sold or serviced by an employee

Snow Flake Schema

A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table(sales).

In snow flake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this snow flake schema approach increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space, snow flake schema approach may be avoided.

Example of Snow Flake Schema: Figure 1.7

Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures.

Measure Types

· Additive - Measures that can be added across all dimensions.

· Non Additive - Measures that cannot be added across all dimensions.

· Semi Additive - Measures that can be added across few dimensions and not with others.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).

In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables.

Steps in designing Fact Table

· Identify a business process for analysis(like sales).

· Identify measures or facts (sales dollar).

· Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).

· List the columns that describe each dimension.(region name, branch name, region name).

· Determine the lowest level of summary in a fact table(sales dollar).

Example of a Fact Table with an Additive Measure in Star Schema: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is explained below.

· Sales Dollar value for a particular product

· Sales Dollar value for a product in a location

· Sales Dollar value for a product in a year within a location

· Sales Dollar value for a product in a year within a location sold or serviced by an employee

Dimension Table
Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.

Location Dimension
In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.

Example of Location Dimension: Figure 1.8

Country Lookup

Country Code	Country Name	DateTimeStamp
USA	United States Of America	1/1/2005 11:23:31 AM

State Lookup

State Code	State Name	DateTimeStamp
NY	New York	1/1/2005 11:23:31 AM
FL	Florida	1/1/2005 11:23:31 AM
CA	California	1/1/2005 11:23:31 AM
NJ	New Jersey	1/1/2005 11:23:31 AM

County Lookup

County Code	County Name	DateTimeStamp
NYSH	Shelby	1/1/2005 11:23:31 AM
FLJE	Jefferson	1/1/2005 11:23:31 AM
CAMO	Montgomery	1/1/2005 11:23:31 AM
NJHU	Hudson	1/1/2005 11:23:31 AM

City Lookup

City Code	City Name	DateTimeStamp
NYSHMA	Manhattan	1/1/2005 11:23:31 AM
FLJEPC	Panama City	1/1/2005 11:23:31 AM
CAMOSH	San Hose	1/1/2005 11:23:31 AM
NJHUJC	Jersey City	1/1/2005 11:23:31 AM

Location Dimension

Location Dimension Id	Country Name	State Name	County Name	City Name	DateTime Stamp
1	USA	New York	Shelby	Manhattan	1/1/2005 11:23:31 AM
2	USA	Florida	Jefferson	Panama City	1/1/2005 11:23:31 AM
3	USA	California	Montgomery	San Hose	1/1/2005 11:23:31 AM
4	USA	New Jersey	Hudson	Jersey City	1/1/2005 11:23:31 AM

Product Dimension
In a relational data model, for normalization purposes, product category lookup, product sub-category lookup, product lookup, and and product feature lookups are are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called PRODUCT DIMENSION for performance and slicing data requirements.

Example of Product Dimension: Figure 1.9

Product Category Lookup

Product Category Code	Product Category Name	DateTimeStamp
1	Apparel	1/1/2005 11:23:31 AM
2	Shoe	1/1/2005 11:23:31 AM

Product Sub-Category Lookup

Product Sub-Category Code	Product Sub-Category Name	DateTime Stamp
11	Shirt	1/1/2005 11:23:31 AM
12	Trouser	1/1/2005 11:23:31 AM
13	Casual	1/1/2005 11:23:31 AM
14	Formal	1/1/2005 11:23:31 AM

Product Lookup

Product Code	Product Name	DateTimeStamp
1001	Van Heusen	1/1/2005 11:23:31 AM
1002	Arrow	1/1/2005 11:23:31 AM
1003	Nike	1/1/2005 11:23:31 AM
1004	Adidas	1/1/2005 11:23:31 AM

Product Feature Lookup

Product Feature Code	Product Feature Description	DateTimeStamp
10001	Van-M	1/1/2005 11:23:31 AM
10002	Van-L	1/1/2005 11:23:31 AM
10003	Arr-XL	1/1/2005 11:23:31 AM
10004	Arr-XXL	1/1/2005 11:23:31 AM
10005	Nike-8	1/1/2005 11:23:31 AM
10006	Nike-9	1/1/2005 11:23:31 AM
10007	Adidas-10	1/1/2005 11:23:31 AM
10008	Adidas-11	1/1/2005 11:23:31 AM

Product Dimension

Product Dimension Id	Product Category Name	Product Sub-Category Name	Product Name	Product Feature Desc	DateTime Stamp
100001	Apparel	Shirt	Van Heusen	Van-M	1/1/2005 11:23:31 AM
100002	Apparel	Shirt	Van Heusen	Van-L	1/1/2005 11:23:31 AM
100003	Apparel	Shirt	Arrow	Arr-XL	1/1/2005 11:23:31 AM
100004	Apparel	Shirt	Arrow	Arr-XXL	1/1/2005 11:23:31 AM
100005	Shoe	Casual	Nike	Nike-8	1/1/2005 11:23:31 AM
100006	Shoe	Casual	Nike	Nike-9	1/1/2005 11:23:31 AM
100007	Shoe	Casual	Adidas	Adidas-10	1/1/2005 11:23:31 AM
100008	Shoe	Casual	Adidas	Adidas-11	1/1/2005 11:23:31 AM

Organization Dimension
In a relational data model, for normalization purposes, corporate office lookup, region lookup, branch lookup, and employee lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called ORGANIZATION DIMENSION for performance and slicing data.

This dimension helps us to find the products sold or serviced within the organization by the employees. In any industry, we can calculate the sales on region basis, branch basis and employee basis. Based on the performance, an organization can provide incentives to employees and subsidies to the branches to increase further sales.

Example of Organization Dimension: Figure 1.10

Corporate Lookup

Corporate Code	Corporate Name	DateTimeStamp
CO	American Bank	1/1/2005 11:23:31 AM

Region Lookup

Region Code	Region Name	DateTimeStamp
SE	South East	1/1/2005 11:23:31 AM
MW	Mid West	1/1/2005 11:23:31 AM

Branch Lookup

Branch Code	Branch Name	DateTimeStamp
FLTM	Florida-Tampa	1/1/2005 11:23:31 AM
ILCH	Illinois-Chicago	1/1/2005 11:23:31 AM

Employee Lookup

Employee Code	Employee Name	DateTimeStamp
E1	Paul Young	1/1/2005 11:23:31 AM
E2	Chris Davis	1/1/2005 11:23:31 AM

Organization Dimension

Organization Dimension Id	Corporate Name	Region Name	Branch Name	Employee Name	DateTime Stamp
1	American Bank	South East	Florida-Tampa	Paul Young	1/1/2005 11:23:31 AM
2	American Bank	Mid West	Illinois-Chicago	Chris Davis	1/1/2005 11:23:31 AM

Time Dimension
In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.

Example of Time Dimension: Figure 1.11

Year Lookup

Year Id	Year Number	DateTimeStamp
1	2004	1/1/2005 11:23:31 AM
2	2005	1/1/2005 11:23:31 AM

Quarter Lookup

Quarter Number	Quarter Name	DateTimeStamp
1	Q1	1/1/2005 11:23:31 AM
2	Q2	1/1/2005 11:23:31 AM
3	Q3	1/1/2005 11:23:31 AM
4	Q4	1/1/2005 11:23:31 AM

Month Lookup

Month Number	Month Name	DateTimeStamp
1	January	1/1/2005 11:23:31 AM
2	February	1/1/2005 11:23:31 AM
3	March	1/1/2005 11:23:31 AM
4	April	1/1/2005 11:23:31 AM
5	May	1/1/2005 11:23:31 AM
6	June	1/1/2005 11:23:31 AM
7	July	1/1/2005 11:23:31 AM
8	August	1/1/2005 11:23:31 AM
9	September	1/1/2005 11:23:31 AM
10	October	1/1/2005 11:23:31 AM
11	November	1/1/2005 11:23:31 AM
12	December	1/1/2005 11:23:31 AM

Week Lookup

Week Number	Day of Week	DateTimeStamp
1	Sunday	1/1/2005 11:23:31 AM
1	Monday	1/1/2005 11:23:31 AM
1	Tuesday	1/1/2005 11:23:31 AM
1	Wednesday	1/1/2005 11:23:31 AM
1	Thursday	1/1/2005 11:23:31 AM
1	Friday	1/1/2005 11:23:31 AM
1	Saturday	1/1/2005 11:23:31 AM
2	Sunday	1/1/2005 11:23:31 AM
2	Monday	1/1/2005 11:23:31 AM
2	Tuesday	1/1/2005 11:23:31 AM
2	Wednesday	1/1/2005 11:23:31 AM
2	Thursday	1/1/2005 11:23:31 AM
2	Friday	1/1/2005 11:23:31 AM
2	Saturday	1/1/2005 11:23:31 AM

Time Dimension

Time Dim Id	Year No	Day Of Year	Quarter No	Month No	Month Name	Month Day No	Week No	Day of Week	Cal Date	DateTime Stamp
1	2004	1	Q1	1	January	1	1	5	1/1/2004	1/1/2005 11:23:31 AM
2	2004	32	Q1	2	February	1	5	1	2/1/2004	1/1/2005 11:23:31 AM
3	2005	1	Q1	1	January	1	1	7	1/1/2005	1/1/2005 11:23:31 AM
4	2005	32	Q1	2	February	1	5	3	2/1/2005	1/1/2005 11:23:31 AM

Slowly Changing Dimensions
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3. The following section deals with how to capture and handling these changes over time.

The "Product" table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1's price changes from $150 to $350. With this information, let us explain the three types of Slowly Changing Dimensions.

Product Price in 2004:

Product ID(PK)	Year	Product Name	Product Price
1	2004	Product1	$150

Type 1: Overwriting the old values.
In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product "Product1" in year 2004 since the table now contains only the new price and year information.

Product

Product ID(PK)	Year	Product Name	Product Price
1	2005	Product1	$250

Type 2: Creating an another additional record.
In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product

Product ID(PK)	Year	Product Name	Product Price
1	2004	Product1	$150
1	2005	Product1	$250

The problem with the above mentioned data structure is "Product ID" cannot store duplicate values of "Product1" since "Product ID" is the primary key. Also, the current data structure doesn't clearly specify the effective date and expiry date of Product1 like when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation.

Product

Product ID(PK)	Effective DateTime(PK)	Year	Product Name	Product Price	Expiry DateTime
1	01-01-2004 12.00PM	2004	Product1	$150	12-31-2004 11.59PM
1	01-01-2005 12.00PM	2005	Product1	$250

In the changed Product table's Data structure, "Product ID" and "Effective DateTime" are composite primary keys. So there would be no violation of primary key constraint. Addition of new columns, "Effective DateTime" and "Expiry DateTime" provides the information about the product's effective date and expiry date which adds more clarity and enhances the scope of this table. Type2 approach may need additional space in the data base, since for every changed record, an additional row has to be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields.
In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.

Product

Product ID(PK)	Current Year	Product Name	Current Product Price	Old Product Price	Old Year
1	2005	Product1	$250	$150	$2004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2006, if the product1's price changes to $350, then we would not be able to see the complete history of 2004 prices, since the old values would have been updated with 2005 product information.

Product

Product ID(PK)	Year	Product Name	Product Price	Old Product Price	Old Year
1	2006	Product1	$350	$250	$2005

What are ETL Tools?
ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.

There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.

Popular ETL Tools

Tool Name	Company Name
Informatica	Informatica Corporation
DT/Studio	Embarcadero Technologies
DataStage	IBM
Ab Initio	Ab Initio Software Corporation
Data Junction	Pervasive Software
Oracle Warehouse Builder	Oracle Corporation
Microsoft SQL Server Integration	Microsoft

ETL Concepts
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.

Glossary of ETL (Reference:www.Oracle.com)

Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping
The definition of the relationship and data flow between source and target objects.

Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.

Staging Area
A place where data is processed before entering the warehouse.

Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Transportation
The process of moving copied or transformed data from a source to a data warehouse.

Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

Figure 1.12 : Sample ETL Process Flow

Database - RDBMS
There are a number of relational databases to store data. A relational database contains normalized data stored in tables. Tables contain records and columns. RDBMS makes it easy to work with individual records. Each row contains a unique instance of data for the categories defined by the columns.

RDBMS are used in OLTP applications(e.g. ATM cards) very frequently and sometimes datawarehouse may also use relational databases. Please refer to Relational data modeling for details to know how data from a source system is normalized and stored in RDBMS databases.

Popular RDBMS Databases

RDBMS Name	Company Name
Oracle	Oracle Corporation
IBM DB2 UDB	IBM Corporation
IBM Informix	IBM Corporation
Microsoft SQL Server	Microsoft
Sybase	Sybase Corporation
Terradata	NCR

Business Intelligence:
Business Intelligence is a technology based on customer and profit oriented models that reduces operating costs and provide increased profitability by improving productivity, sales, service and helps to make decision making capabilities at no time. Business Intelligence Models are based on multi dimensional analysis and key performance indicators (KPI) of an enterprise.

Business Intelligence applications that are based on Business Intelligence Models are created by Business Intelligence software which provides the aggregated details about suppliers, customers, internal activities, business to business transactions to the managers or whoever needs it to take better corporate decisions.

Many business questions or situations need to be analyzed in order to achieve the target of an enterprise with the help of several managers or executives in each cadre. Below are some of the samples of these questions.

Business Intelligence: Finance:
What is the net income, expenses, gross profit, and net profit for this quarter, year?

Business Intelligence: Accounts:
What is the sales amount this month and what is the outstanding pending payment?

Business Intelligence: Purchase:
Who is the vendor to be contacted to purchase products?

Business Intelligence: Production:
How many products are manufactured in each production unit today, weekly, monthly?

Business Intelligence: Sales:
How many products have been sold in each area today, weekly, monthly?

Business Intelligence: Quality:
How many products have been defective today, weekly, monthly, quarterly, yearly?

Business Intelligence: Service:
Are the customers satisfied with the quality?

These business intelligence questions are related with why, what, how, when, and business intelligence reports(olap reports) are useful in providing solutions to the above questions by means of reporting, score cards, balance score cards that are helpful in managerial decisions.

Business Intelligence Tools

Business Intelligence Tools help to gather, store, access and analyze corporate data to aid in decision-making. Generally these systems will illustrate business intelligence in the areas of customer profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distribution analysis.

With Business Intelligence Tools, various data like customer related, product related, sales related, time related, location related, employee related etc. are gathered and analysed based on which important strategies or rules are formed and goals to achieve their target are set. These decisions are very efficient and effective in promoting an Organisation's growth.

Since the collected data can be sliced across almost all the dimensions like time,location, product, promotion etc., valuable statistics like sales profit in one region for the current year can be calculated and compared with the previous year statistics.

Popular Business Intelligence Tools

Tool Name	Company Name
Business Objects	Business Objects
Cognos	Cognos
Hyperion	Hyperion
Microstrategy	Microstrategy
Microsoft Reporting Services	Microsoft
Crystal	Business Objects

OLAP & its Hybrids

OLAP, an acronym for Online Analytical Processing is an approach that helps organization to take advantages of DATA. Popular OLAP tools are Cognos, Business Objects, Micro Strategy etc. OLAP cubes provide the insight into data and helps the topmost executives of an organization to take decisions in an efficient manner.

Technically, OLAP cube allows one to analyze data across multiple dimensions by providing multidimensional view of aggregated, grouped data. With OLAP reports, the major categories like fiscal periods, sales region, products, employee, promotion related to the product can be ANALYZED very efficiently, effectively and responsively. OLAP applications include sales and customer analysis, budgeting, marketing analysis, production analysis, profitability analysis and forecasting etc.

ROLAP
ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).

MOLAP
MOLAP(Multidimensional OLAP), provides the analysis of data stored in a multi-dimensional data cube.

HOLAP
HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS).

DOLAP
DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.

OLAP Analysis

Imagine an organization that manufactures and sells goods in several States of USA which employs hundreds of employees in its manufacturing, sales and marketing division etc. In order to manufacture and sell this product in profitable manner, the executives need to analyse(OLAP analysis) the data on the product and think about various possibilities and causes for a particular event like loss in sales, less productivity or increase in sales over a particular period of the year.

During the OLAP analysis, the top executives may seek answers for the following:
1. Number of products manufactured.
2. Number of products manufactured in a location.
3. Number of products manufactured on time basis within a location.
4. Number of products manufactured in the current year when compared to the previous year.
5. Sales Dollar value for a particular product.
6. Sales Dollar value for a product in a location.
7. Sales Dollar value for a product in a year within a location.
8. Sales Dollar value for a product in a year within a location sold or serviced by an employee.
OLAP tools help executives in finding out the answers, not only to the above mentioned measures, even for the very complex queries by allowing them to slice and dice, drill down from higher level to lower level summarized data, rank, sort, etc.

Example of OLAP Analysis Report

Time Dimension Id	Location Dimension Id	Product Dimension Id	Organization Dimension Id	Sales Dollar	DateTimeStamp
1	1	100001	1	1000	1/1/2005 11:23:31 AM
3	1	100001	1	750	1/1/2005 11:23:31 AM
1	1	100001	2	1000	1/1/2005 11:23:31 AM
3	1	100001	2	750	1/1/2005 11:23:31 AM

In the above example of OLAP analysis, data can be sliced and diced, drilled up and drilled down for various hierarchies like time dimension, location dimension, product dimension, and organization dimension . This would provide the topmost executives to take a decision about the product performance in a location/time/organization. In OLAP reports, Trend analysis can be also made by comparing the sales value of a particular product over several years or quarters.

OLAP Database – Multidimensional

This is a type of database that is optimized for data warehouse, data mart and online analytical processing (OLAP) applications. The main advantage of this database is query performance.

Relational databases make it easy to work with individual records, whereas multidimensional databases are designed for analyzing large groups of records. Relational database is typically accessed using a Structured Query Language (SQL) query. A multidimensional database allows a user to ask questions like "How many mortgages have been sold in New Jersey city" and "How many credit cards have been purchased in a particular county?".

Popular Multidimensional Databases

Database Name	Company Name
Crystal Holos	Business Objects
Hyperion Essbase	Hyperion
Oracle Express	Oracle Corporation
Oracle OLAP Option	Oracle Corporation
AWMicrosoft Analysis Services	Microsoft
PowerPlay Enterprise	Cognos

MSBI CONCEPTS

Pages

Saturday, January 1, 2011

Data Warehouse Concepts

12 comments: