The database grew into a data warehouse. The warehouse evolved into a data lake. The singular data lake has grown to encompass multiple data lakes. Structured data still exists, but unstructured data has gained added importance. Much of our data resides in data centers, but is also increasingly created and processed at the edge. Data-at-rest and data-in-motion must be governed and secured.
As the velocity, volume, and variety of data grows, data management architecture needs to evolve to support it.
With Gartner predicting the existence of 20.4 billion connected things by 2020, the Internet of Things (IoT) and streaming data are playing a large role in pushing organizations to modernize and update their approach to data management. To compete, businesses require the power to process, analyze, and act on IoT and streaming data in real time. They have transformed the analytics function from being a backward-looking process into a prescriptive, predictive, forward-looking function.
At its best, IoT and streaming data set up a virtuous cycle of capturing data by refining algorithms based on the data and feeding the updated algorithms back to the point of origination. This cycle demands infrastructure that is elastic, flexible, and scalable. Enabling this functionality is where most businesses focus their attention. This is an essential driver for updating data management architecture, but there are other pressing needs for these updates as well—primarily data governance and data security.
If you mention governance and security, many people think only of how that applies to their data-at-rest. Applying those same policies to data-in-motion is equally important. Where did the data originate? Where did it travel? Who touched it in transit? How was it changed? These questions are as applicable to data-at-rest in a Hadoop structure as they are to data-in-motion being emitted from a sensor. To ensure governance and security, we also must be able to answer these questions in the context of streaming and IoT data.
Security and governance concerns become paramount because of how data has evolved. If you’re managing data across multiple lakes, in multiple locations, and in various states of transit, you want consistent security policies applied in every setting. If you began by building a data lake in an on-premises data center, and later choose to store that data in a cloud setting, you want to be sure that your security policies carry over to the cloud and can move from cloud to cloud. You don’t want the burden of recreating each policy for each cloud setting. For the same reason, you need visibility into your data lineage. No matter the source of the data and no matter where that data ends up, you need to know its origination, its journey, who touched it, and how it changed along the way.
In modern-day architecture, data comes from multiple sources: partner ecosystems, IoT devices, social streams, and any number of other sources. One essential element of a successful architecture is the ability to ensure you’re capturing data from all those sources.
The destination for your data is also not one single data lake. In reality, data ends up being stored in multiple data lakes, on premises, and in the cloud. Any update of your systems must consider control—as well as awareness—of all available data sources. Modern data architecture should provide a simple mechanism to easily extract the data you need, no matter the source or provenance.
Finally, in the age of the General Data Protection Regulation (GDPR), compliance and governance matter even more. With the rise of IoT and streaming data, governing how data makes its journey from the edge to the enterprise takes on more urgency. That edge-to-enterprise data journey demands a new view into modern-day data management architecture.
As IoT devices and streaming data extend their reach, having big data infrastructure in place is vital. In today’s data environment, gaining data control, security, and insight requires an abstraction layer—or dataplane—that fits across all the data centers and sources bound by your architecture. Even as data evolves, this data fabric will allow constant awareness of, and access to, your data, whether it’s in motion, in transit, or at rest. It will also ensure consistent protection and the ability to enforce security policies throughout your data’s journey. It’s not a matter of if you need to update your data architecture, it’s when. Ready or not, the flow of big data will not stop. Your ability to harness, use, and protect that flow of information is likely to shape your success for years to come.
If you’re looking to update your data management architecture, learn more about the latest technology here.