Wednesday, February 10, 2016

Data Warehouse Concepts - 1

Data warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources.
They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses.
The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.


Types of Data warehouse:
  • ·         Data mart
  • ·         OLAP
  • ·         OLTP
  • ·         Predictive Analysis


Data mart

A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area) hence, they draw data from a limited number of sources such as sales, finance or marketing.
Data marts are often built and controlled by a single department within an organization.
The sources could be internal operational systems, a central data warehouse, or external data. Denormalization is the norm for data modeling techniques in this system.
Given that data marts generally cover only a subset of the data contained in a data warehouse, they are often easier and faster to implement.

The difference between data warehouse and data mart

Data warehouse
Data mart
enterprise-wide data
department-wide data
multiple subject areas
single subject area
difficult to build
easy to build
takes more time to build
less time to build
larger memory
limited memory

Types of data marts:
  • Dependent data marts draw data from a central data warehouse that has already been created.
  • Independent data marts, in contrast, are standalone systems built by drawing data directly from operational or external sources of data or both.
  • Hybrid data marts can draw data from operational systems or data warehouses.


Online Analytical Processing

  • ·         Relatively low volume of transactions.
  • ·         Response time is an effectiveness measure.
  • ·         Queries are often very complex and involve aggregations.
  • ·         OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP applications are widely used by Data Mining techniques.
  • ·         OLAP systems typically have data latency of a few hours, as opposed to data marts, where latency is expected to be closer to one day.
  • ·         Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas
  • ·         The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives.
  • ·         OLAP clients include many spreadsheet programs like Excel, web application, sql, dashboard tools, etc
  • ·         The three basic operations in OLAP are: Roll-up (Consolidation), Drill-down and Slicing & Dicing. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson or by date or by customer or by product or by region, etc.)

Types of OLAP:
·         MOLAP - Multi-dimensional OLAP
·         ROLAP - Relational OLAP
·         HOLAP - Hybrid OLAP
·         WOLAP - Web-based OLAP
·         DOLAP - Desktop OLAP
·         RTOLAP - Real-Time OLAP

MOLAP
ROLAP
Stores data in an optimized multi-dimensional array
Stores data as a relational database
Processing (consolidating data into the cube) is required
No processing required. Queries on plain data in tables.
Fast response as the data cube contains all possible answers for a given range opf questions
Relatively slower response as each slicing and dicing requires a query with a WHERE clause. But all questions can be answered.
Pre-computation(consolidation) may lead to data explosion
Computation is done on demand through queries
May introduce data redundancy
No data redundancy
Limited to the contents of the cube
Ability to drill down to the lowest level of the details

HOLAP addresses the shortcomings of both the MOLAP and ROLAP.  HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support.

OLAP Languages
·         1997 - MDX was the first query language introduced for OLAP.
·         2001 - XML for Analysis specification.
·         2011 - LINQ can be used to query SSAS OLAP cubes from Microsoft .NET.

Online transaction processing (OLTP)
  • ·         OLTP is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE), particularly for data entry and transactional processing.
  • ·         OLTP has also been used to refer to processing in which the system responds immediately to user requests.
  • ·         OLTP systems emphasize very fast query processing and maintaining data integrity in multi-access environments.
  • ·         Effectiveness is measured by the number of transactions per second.
  • ·         OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model (usually 3NF). Normalization is the norm for data modeling techniques in this system.
  • ·         OLTP is often integrated into service-oriented architecture (SOA) and Web services.
  • ·         On line transaction process concerns about concurrency and atomicity.


Predictive analysis
  • ·         Predictive analysis is about finding and quantifying hidden patterns in the data using complex mathematical models that can be used to predict future outcomes.
  • ·         Predictive analysis is different from OLAP in that OLAP focuses on historical data analysis and is reactive in nature, while predictive analysis focuses on the future. These systems are also used for CRM (customer relationship management).


No comments:

Post a Comment