Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
prev slide
Becoming a Data-Driven Enterprise Takes Dedication
November 07, 2017
How to Improve Customer Service Using Big Data Analytics
Next slide

Thinking Beyond Traditional Data Life Cycle Management

It should go without saying that there is no “big data” without data—lots and lots of it. In fact, depending on the application, you might have striking amounts of data. But this data has different needs and obligations, based on where it was created, where it is moving, and how old it is, thus ushering in the discussion of data life cycle management.

What are the typical phases of the data life cycle? Where does big data complicate and change that life cycle, and how can a data platform keep you organized and compliant?

The Traditional Data Life Cycle

When discussing data life cycle management, most people consider the four major stages that take data from cradle to grave.

  • Creation: The creation process describes when pieces of data are born into the organization (the cradle). The data might come from sensors, events, applications, website traffic, or other sources.
  • Transmission: Once created, the data is trasmitted to another location based on the data’s intended use case. An important concern is making sure the data packet stays whole as it’s being moved and being aware of any other packets it might merge with.
  • Active Storage and Regular Use: Data storage may be in a traditional data warehouse environment, in a data lake, or in several potential storage locations. Traditionally, this phase is where active analytics, testing of the data, and visualization and reporting work are done, because the data is kept warm and accessible.
  • Archival: Once data has grown colder—after active work has concluded, more recent data is available and preferred, or some part of the data corpus has gone stale—it’s typically moved to archives. It’s not as easily accessible, and in some cases it’s trimmed to representative elements or otherwise rolled up into summary views to save space and increase archive capacity.
  • Destruction: After a certain time period, forced either by law or archival space constraints, data will need to be deleted (the grave).

Data-in-Motion Plays an Important Role

Another less classical part of data life cycle management is growing in importance: the life of data-in-motion. Consider sensors on a vehicle that measure several aspects of the vehicle’s status and performance, like speed, temperature, tire pressure, GPS location coordinates, and more: all of that data has to go somewhere. Traditionally, it went into a data warehouse and analytics were run. But over time, your organization will need to move the resulting analytical decision-making to earlier in the data’s life, closer to the point of acquisition.

Can we make decisions and optimize performance (or prevent failures) by looking at data-in-motion closer to where it’s generated? As the big data landscape changes, key insights are uncovered by examining data-in-motion, especially from analyzing and understanding actions and occurrences before an event.

For example, when collecting data on vehicles, we might want to capture specifics like speed combined with outside temperature in order to instruct the driver to slow down when approaching certain speed thresholds. Managing that sensor data just after it’s been generated—and understanding how to keep it alive (whether it’s from sensors, websites, applications, or wherever else) in a connected data architecture—and connecting it with data-at-rest is a key offering of data platforms.

Governance and Compliance Are Important Considerations

Part of the overarching story of the big data life cycle is the idea that governance and compliance play a huge part at each stage of the data’s journey—a fact only complicated by the kind of big data footprint you’re leaving. Is it varied and well-defined? Are you collecting data globally in a variety of countries? Do those countries have data privacy laws and regulatory compliance schemes you need to be aware of? What, precisely, is your company allowed to do with that data, and where can it live in its full, most complete state?

Your data platform choice should natively understand these regulations and restrictions, and still be able to present you with the tools you need to work with the data while remaining in compliance with local law. That may mean masking personally identifiable information when displaying data for users, encrypting data in transit, or some other combination of techniques. The right platform will still allow you to capture, clarify, and control copious amounts of data while staying compliant.

Take your exploration of the data life cycle and data-in-motion further: with all the technologies available today, companies can now collect data at the edge, process it, and secure it in motion. Once you have the right tools at your disposal, you’ll be able to manage your data effectively and get the insights you need.

Discover how to protect your data life cycle with a resilient joint solution that enables you to build a fast, secure data engine across multiple Hadoop clusters.

Leave a Reply

Your email address will not be published. Required fields are marked *