Data
warehouse
In computing, a data warehouse (DW or DWH), also known as an
enterprise data warehouse (EDW), is a system used for reporting and data
analysis. DWs are central repositories of integrated data from one or more
disparate sources.
They store current and historical data and are used for
creating analytical reports for knowledge workers throughout the enterprise.
Examples of reports could range from annual and quarterly comparisons and
trends to detailed daily sales analyses.
The data stored in the warehouse is uploaded from the
operational systems (such as marketing, sales, etc., shown in the figure to the
right). The data may pass through an operational data store for additional
operations before it is used in the DW for reporting.
Types of Data warehouse:
- · Data mart
- · OLAP
- · OLTP
- · Predictive Analysis
Data mart
A data mart is a simple form of a data warehouse that is
focused on a single subject (or functional area) hence, they draw data from a
limited number of sources such as sales, finance or marketing.
Data marts are often built and controlled by a single
department within an organization.
The sources could be internal operational systems, a central
data warehouse, or external data. Denormalization is the norm for data modeling
techniques in this system.
Given that data marts generally cover only a subset of the
data contained in a data warehouse, they are often easier and faster to
implement.
The difference between data warehouse and data mart
Data warehouse
|
Data mart
|
enterprise-wide data
|
department-wide data
|
multiple subject areas
|
single subject area
|
difficult to build
|
easy to build
|
takes more time to build
|
less time to build
|
larger memory
|
limited memory
|
Types of data marts:
- Dependent data marts draw data from a central data warehouse that has already been created.
- Independent data marts, in contrast, are standalone systems built by drawing data directly from operational or external sources of data or both.
- Hybrid data marts can draw data from operational systems or data warehouses.
Online
Analytical Processing
- · Relatively low volume of transactions.
- · Response time is an effectiveness measure.
- · Queries are often very complex and involve aggregations.
- · OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP applications are widely used by Data Mining techniques.
- · OLAP systems typically have data latency of a few hours, as opposed to data marts, where latency is expected to be closer to one day.
- · Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas
- · The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives.
- · OLAP clients include many spreadsheet programs like Excel, web application, sql, dashboard tools, etc
- · The three basic operations in OLAP are: Roll-up (Consolidation), Drill-down and Slicing & Dicing. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson or by date or by customer or by product or by region, etc.)
Types of OLAP:
·
MOLAP - Multi-dimensional OLAP
·
ROLAP - Relational OLAP
·
HOLAP - Hybrid OLAP
·
WOLAP - Web-based OLAP
·
DOLAP - Desktop OLAP
·
RTOLAP - Real-Time OLAP
MOLAP
|
ROLAP
|
Stores data in an optimized
multi-dimensional array
|
Stores data as a relational database
|
Processing (consolidating data into the
cube) is required
|
No processing required. Queries on plain
data in tables.
|
Fast response as the data cube contains all
possible answers for a given range opf questions
|
Relatively slower response as each slicing
and dicing requires a query with a WHERE clause. But all questions can be
answered.
|
Pre-computation(consolidation) may lead to
data explosion
|
Computation is done on demand through
queries
|
May introduce data redundancy
|
No data redundancy
|
Limited to the contents of the cube
|
Ability to drill down to the lowest level
of the details
|
HOLAP addresses the
shortcomings of both the MOLAP and ROLAP.
HOLAP encompasses a range of solutions that attempt to mix the best of
ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer
good function support.
OLAP Languages
·
1997 - MDX was the first query language
introduced for OLAP.
·
2001 - XML for Analysis specification.
·
2011 - LINQ can be used to query SSAS OLAP cubes
from Microsoft .NET.
Online
transaction processing (OLTP)
- · OLTP is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE), particularly for data entry and transactional processing.
- · OLTP has also been used to refer to processing in which the system responds immediately to user requests.
- · OLTP systems emphasize very fast query processing and maintaining data integrity in multi-access environments.
- · Effectiveness is measured by the number of transactions per second.
- · OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model (usually 3NF). Normalization is the norm for data modeling techniques in this system.
- · OLTP is often integrated into service-oriented architecture (SOA) and Web services.
- · On line transaction process concerns about concurrency and atomicity.
Predictive
analysis
- · Predictive analysis is about finding and quantifying hidden patterns in the data using complex mathematical models that can be used to predict future outcomes.
- · Predictive analysis is different from OLAP in that OLAP focuses on historical data analysis and is reactive in nature, while predictive analysis focuses on the future. These systems are also used for CRM (customer relationship management).
No comments:
Post a Comment