It should go without saying that there is no “big data” without data—lots and lots of it. In fact, depending on the application, you might have striking amounts of data. But this data has different needs and obligations, based on where it was created, where it is moving, and how old it is, thus ushering in the discussion of data life cycle management.
What are the typical phases of the data life cycle? Where does big data complicate and change that life cycle, and how can a data platform keep you organized and compliant?
When discussing data life cycle management, most people consider the four major stages that take data from cradle to grave.
Another less classical part of data life cycle management is growing in importance: the life of data-in-motion. Consider sensors on a vehicle that measure several aspects of the vehicle’s status and performance, like speed, temperature, tire pressure, GPS location coordinates, and more: all of that data has to go somewhere. Traditionally, it went into a data warehouse and analytics were run. But over time, your organization will need to move the resulting analytical decision-making to earlier in the data’s life, closer to the point of acquisition.
Can we make decisions and optimize performance (or prevent failures) by looking at data-in-motion closer to where it’s generated? As the big data landscape changes, key insights are uncovered by examining data-in-motion, especially from analyzing and understanding actions and occurrences before an event.
For example, when collecting data on vehicles, we might want to capture specifics like speed combined with outside temperature in order to instruct the driver to slow down when approaching certain speed thresholds. Managing that sensor data just after it’s been generated—and understanding how to keep it alive (whether it’s from sensors, websites, applications, or wherever else) in a connected data architecture—and connecting it with data-at-rest is a key offering of data platforms.
Part of the overarching story of the big data life cycle is the idea that governance and compliance play a huge part at each stage of the data’s journey—a fact only complicated by the kind of big data footprint you’re leaving. Is it varied and well-defined? Are you collecting data globally in a variety of countries? Do those countries have data privacy laws and regulatory compliance schemes you need to be aware of? What, precisely, is your company allowed to do with that data, and where can it live in its full, most complete state?
Your data platform choice should natively understand these regulations and restrictions, and still be able to present you with the tools you need to work with the data while remaining in compliance with local law. That may mean masking personally identifiable information when displaying data for users, encrypting data in transit, or some other combination of techniques. The right platform will still allow you to capture, clarify, and control copious amounts of data while staying compliant.
Take your exploration of the data life cycle and data-in-motion further: with all the technologies available today, companies can now collect data at the edge, process it, and secure it in motion. Once you have the right tools at your disposal, you’ll be able to manage your data effectively and get the insights you need.
Discover how to protect your data life cycle with a resilient joint solution that enables you to build a fast, secure data engine across multiple Hadoop clusters.