cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta

Enterprise Data Warehouse Optimization

Reduce costs by moving data and processing to Hadoop®

cloud Learn how you can modernize your data warehouse with Hadoop

DOWNLOAD WHITEPAPER

What is an EDW?

Enterprise Data Warehouse (EDW) is an organization’s central data repository that is built to support business decisions. EDW contains data related to areas that the company wants to analyze. For a manufacturer, it might be customer, product or bill of material. EDW is built by extracting data from a number of operational systems. As the data is fed into EDW it is converted, reformatted and summarized to present a single corporate view. Data is added into the data warehouse over time in the form of snapshots and normally EDW contains data spanning 5 to 10 years.

EDW Optimization

Problems with a typical EDW

EDW is Expensive

icon6.png

Built on commercial and proprietary technology that is expensive to acquire (licensing cost)

icon6.png

Runs on expensive converged appliances

icon6.png

Cost continues to rise as new users and data is added to EDW

icon6.png

Operationally expensive – takes 18 to 24 months to find data sources, agree on business questions and model the data to answer them

EDW is Rigid

icon6.png

Data model must be in place before a single business question can be answered using the data in EDW, (schema-on-write)

icon6.png

Designed to answer pre-determined questions.

icon6.png

Data modeling is a lengthy and labor intensive process

icon6.png

Any change in the organization’s business model requires a change in the EDW’s data mode

EDW is Inefficient

icon6.png

50-70% of data is unused and or cold in EDW

icon6.png

45-65% of CPU capacity is used for ETL/ELT

icon6.png

25-35% of CPU consumed by ETL is to load unused data

icon6.png

30-40% of CPU is consumed by only 5% of ETL workloads

Optimizing EDW with Apache Hadoop ®

Cost Effective

icon6.png

HDP (Hortonworks Data Platform) is 100% open - there is no licensing fee for software

icon6.png

HDP runs on commodity hardware

icon6.png

New data can be landed in HDP and used in days or even hours

Flexible

icon6.png

Data can be loaded in HDP without having a data model in place

icon6.png

Data model can be applied based on the questions being asked of data (schema-on-read

icon6.png

HDP is designed to answer questions as they occur to the user

Efficient

icon6.png

100% of the data is available at granular level for analysis

icon6.png

HDP can store and analyze both structured and unstructured data

icon6.png

Data can be analyzed in different ways to support diverse use cases

Use-Cases on EDW Optimization

USE-CASE 1
media img

ARCHIVE

By design, Hadoop runs on low-cost commodity servers and direct attached storage that allows for a dramatically lower overall cost. When compared to high-end storage area networks, the option of scale-out commodity compute and storage using Hadoop provides a compelling alternative —one which allows the user to scale out their hardware only as their data grows. This cost dynamic makes it possible to store, process, access and analyze more data than ever before.

Learn More

USE-CASE 2
media img

ONBOARD

The ETL function is a relatively low-value computing workload that can be performed at a low cost in Hadoop. When onboarded to Hadoop, data is extracted, transformed and then the results are loaded into the data warehouse. The result: critical CPU cycles and storage space are freed for the truly high value functions – analytics and operations – that best leverage advanced capabilities in the data architecture.

Learn More

USE-CASE 3
media img

ENRICH

An incredible array of new data types opens possibilities for analysis within the high-performance EDW environment. The varied structures of these new data types, however, present challenges for EDWs not designed to ingest and analyze those formats. Many organizations rely on the flexibility of Hadoop to capture, store and refine these new data types to use within the EDW. They take advantage of the ability to define schema upon read in Hadoop, gathering and storing data in any format and creating schema to support analysis in the EDW when necessary.

Learn More