Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
prev slide
How Big Data Can Combat Insurance Fraud
December 12, 2017
5 Challenges of Utilizing Big Data in the Public Sector and How to Overcome Them
Next slide

Uncovering the Future of Data Infrastructure and Intelligence

Digital data collection is changing, and companies are going to need the data infrastructure to survive. Here’s what’s going to happen, and what you must do to be ready.

An Abbreviated History of Digital Data

Over the last few years, we’ve seen data-driven recommendation systems that delivered choices to us based on what the online services knew about the previous choices. Companies such as Netflix and Amazon have spent years refining these rules-based data processing methods.

More recently, mobile computing has created frictionless services, available to us when and where we need them, at the swipe of a thumb. Uber is a good example of this: when a user requests a car, the service’s backend data engine draws on real-time information to find the right driver at the right price, and shows you where that car is en route as it is happening.

The third wave of online services will close the gap entirely because consumers won’t get services in the same way. Tomorrow, embedded intelligence will deliver services before the user knows to ask for them.

What will this look like? Self-driving cars will automatically re-route their way around congestion. Digital assistants will know who our contacts are when we visit a city, and ask us automatically if we’d like a dinner booked with them. As these services evolve, companies will need to focus on data science.

More Data, More Science

A decade ago, graphical processing units (GPUs) expanded from graphics into more general-purpose computing applications. They enabled developers to parallelize data analytics tasks, unlocking inexpensive data processing at scale. It revolutionized data infrastructure overnight.

Now that GPUs have unclogged the computing bottleneck, companies should be thirsty for something else to fuel their consumer and business services: the data itself. The more high-quality data a company has to process, the more powerful and accurate its results can be.

The Internet of Things (IoT) will increase the velocity and the volume of structured data, while other sources like social media are generating more unstructured data than ever before. The only challenge is how to mine that data for the knowledge we need. That’s where the true value of the data science revolution comes in: the scientists.

As companies rely increasingly on data to make critical decisions and gain competitive advantage, they will need the talent to use that data to its full potential. Data scientists are in short supply as demand for this rarefied skill set takes off. Inmarsat’s survey of 500 global decision-makers showed that nearly half (47 percent) lacked the skills to make the most of IoT at a delivery level, and another third could benefit from more skills in areas including data science.

Open, Accessible Data Infrastructure

Making data science more accessible will attract more people to the profession and enticing those already in the field to fill the available roles. We need to make the discipline open and accessible. The industry can do this by making the tools supporting big data infrastructure easy to obtain, use, and collaborate. The magic component here is open source.

Unlike proprietary solutions, the code behind open source is transparent. This allows more people to scrutinize it for security flaws, and because it’s so accessible, open source is also a fantastic platform for collaboration. Data scientists across the world can use the same tools and platforms, such as Apache Hadoop, when working together on big data projects.

The level of innovation in the open source community is also intriguing. Because the code is transparent and adaptable, an army of contributors is constantly working to enhance open source products with more features.

IBM has long recognized the power of open source, which is why it has made partnerships through the Open Data Platform Initiative to promote an official Hadoop product, while its partners resell IBM’s Data Science Experience open source tools platform.

Silos have no place in a world that thrives on the open exchange of information. As online services become more embedded and intelligent, the data—and the architecture that supports it—must become more open and more fluid.

To read more about why a connected data strategy is critical to the future of your data, download this white paper.


Leave a Reply

Your email address will not be published. Required fields are marked *