Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 18, 2015
prev slideNext slide

Data Management for the New Analytics Culture

Today, SAS and Hortonworks, two long-time partners and innovators in the Big Data and Analytics space, have announced the certification and release of SAS® Data Loader for Hadoop.

Read the guest blog post below and learn more about SAS and Hortonworks’ joint efforts, thanks to Keith Renison, Senior Solutions Architect for SAS Global Technology Practice.

The New Analytics Culture

Let’s talk about three key elements that drive data management for Hadoop. First, and probably the most obvious factor, is Big Data. New data paradigms are exploding and driving changes in data management practices. For most companies, Big Data is the reality of doing business. It’s the proliferation of structured and unstructured data that floods organizations on a daily basis – and, if managed well, that can deliver powerful insights.

Second, Decision Design, is a new generation of data science. Driven in part by the millennial generation and gaming mentality, Decision Design consists in leveraging all available tools in order to experiment, innovate, and create new techniques and approaches to data and analytics, as well as refining the art of data-driven decisions.

Third, Decision Engineering is the mature analytic framework that places significant value on putting the analytic process into production. Design is cool and necessary for innovation, but game-changing concepts need to be turned into cost savings, profit, or risk mitigation to add real value to the organization. When you combine the art of Decision Design, the application of Decision Engineering, and fuel them both with massive amounts of complex data, you get a New Analytics Culture. As the analytic needs of this culture grow and change, so do their data management needs.

The Disconnect

My colleagues and I see data preparation for the New Analytics Culture distinctly different from traditional data warehousing. Data warehousing techniques, and many of the tools that support them, are designed to conform data into standard schemas that are well organized and optimized for building efficient queries. The tools and processes are designed for the back office, used by a data management specialist, for the purposes of giving a finished dataset to analytic and reporting users.

Unfortunately this process falls short of providing what the end user really wants, and ultimately forces a scarce resource to perform all kinds of pre-analytic data management magic to do their job. In fact, it’s commonly understood that 80% of a statistician’s time is spent preparing the data, and subsequently re-working the data as they move through the analytic lifecycle. This disconnect between the people and technology is worth a look. More particularly, it comes with the following challenges:

  • Wide table = good, star schema = bad. Analytic works requires very wide, very detailed tables, often having hundreds or even thousands of variables. Transposition is the statistician’s friend, and pre-aggregation equals pre-determined statistics. Data doesn’t usually come out of the warehouse this way.
  • Do over. Analytic work is iterative. Data management tools as the exclusive domain of IT, paired with cumbersome business processes for getting modified datasets, forces the analytic resources to take matters into their own hands.
  • Not all quality is the same. De-duplicating data, or matching addresses can be important when considering general data quality. But analytic teams spend tons of time developing their own algorithms for analytic data preparation.m Things like gender matching, parsing, match coding, imputation, or pattern matching techniques are used to enrich data for analytics.
  • The final step. Feeding data into high-performing analytic systems is work often left to the analytic people, and can be one of the more difficult tasks when the data management work isn’t tightly coupled with the analytic platforms, either physically or with common metadata.

The Right Experience

As a leader in the entire data-to-decision process, SAS has a unique view into how technology can help give back some of this 80% of lost time to the New Analytics Culture. With the recent release of SAS® Data Loader for Hadoop, we now provide an easy-to-use, self-service tool that works inside the Hortonworks Data Platform to enable:

  • data movement to and from source systems
  • data quality
  • data profiling
  • data transformation
  • data loading into our in-memory analytic platform

SAS® Data Loader is HDP (Hortonworks Data Platform) certified and YARN ready. The collaboration between Hortonworks and SAS is critical to building a Modern Data Architecture (MDA), allowing organizations to collect, store, analyze and manipulate massive quantities of data on their own terms—regardless of the source of that data, how old it is, where it is stored, or under what format. By providing sophisticated data management capabilities to both the Decision Design and Decision Engineering cultures, SAS and Hortonworks allow users to spend more time developing innovative models, and less time working on data, all inside the Hortonworks Data Platform.

To learn more, please visit the SAS® Data Loader for Hadoop web page.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *