The same principles that drove Linux to dominate the enterprise server market now promise to revolutionize your data analysis. Open source analytics tools are supercharging the way we generate business insights, and enterprise customers are embracing them.
In contrast to expensive, proprietary software, open source software allows fast and efficient innovation by letting the entire community make improvements to the code. It’s a simple idea, but it’s taken the world by storm.
The open source concept has been around since Richard Stallman created the GNU project in 1983, but it took several touchstone projects to push it into the public eye.
Linux changed the world as it gained traction in the late 1990s. It now sits at the heart of devices ranging from mainframes down to wearable tech, and everything in between. Apache, the open source web server released in 1995, now powers close to 43 percent of the world’s web servers, and open source software runs a large percentage of the world’s smartphones.
With open source engines now powering several of the world’s most commonly used web browsers, the concept of free, transparent software stretches from server to desktop, and even into your pocket.
Now, open source is delivering the same rich benefits to the next big area in enterprise computing: data analytics. As their data grows, companies are hungry for technology that they can use to mine it for new insights. More innovation in data analytics translates directly to competitive advantage.
“The next five years in the world of data and analytics will see some dramatic and important shifts,” says Scott Gnau, the chief technical officer of Hortonworks.
Open source is a mainstay of cloud computing, which thrives on this development model. Many of the tools that cloud-based developers and administrators favor are open source. The cloud is also a driving platform for data analytics. Its elastic properties make it perfect for tasks that need access to dynamically scalable computing and storage resources. In short, open source, cloud computing, and data analytics go together like toast, butter, and jam.
“As the GitHub generation becomes the dominant force in IT and business executive ranks, open source will become the first choice for many technology tasks,” says Gnau. “Innovation via collaboration is the only natural way to keep up with accelerating requirements.”
Open source analytics tools like the Apache Hadoop software library have revolutionized the processing of distributed data sets. Enterprises have embraced open source tools like this to help keep them at the front of the curve in data analytics.
Department store Macy’s has been using Hadoop for more than five years to understand online purchasing habits on its e-commerce site, and is now expanding that analysis to track customer journeys between its online and physical stores. Progressive Insurance used the software’s open source analytics tools to build its Snapshot usage-based auto-insurance offering, which crunches 15 billion miles of driving data from diagnostic devices. Snapshot uses this data analysis tool to offer discounts to drivers who consistently show safe, responsible driving patterns.
Open source also brings direct financial benefits. Centrica, which delivers residential gas throughout the UK, chose open source to help it better understand customer data after analyzing the cost of proprietary solutions. Instead of spending £5 million for a 12-node solution from a proprietary vendor, it got 250 Hadoop nodes for just £750,000.
As technology develops at breakneck speed, data analytics software must keep up.
“Driven by the wave of new data generated by machines, sensors, and devices, a new landscape for analytics is emerging,” Gnau says. The rush toward the Internet of Things (IoT) is a key driver here, promising to exponentially grow today’s data lakes with vast oceans of unstructured and structured information.
Mining value from all of this IoT data are additional open source analytics engines behind breakthrough deep learning and AI applications. The open source software model is enabling data analytics software to keep pace with these developments by speeding up the cycle of innovation. Projects in the Apache family, such as its Storm real-time computation engine and Kafka data-streaming platform, will be instrumental in helping us grapple with growing data velocity and volume.
As the success of open source continues to grow, expect to see a rise in database and analytics software that will transform the way we house and process our data. This trend chart from DB-Engines shows the open source engine MySQL giving proprietary alternatives from other vendors a run for their money. More significantly, you can see marked growth in the use of open source, NoSQL-style databases such as Elasticsearch and MongoDB that are optimized for large amounts of unstructured information.
Analytics can shine a light on hidden trends in our data, but we don’t need to crunch too many numbers to see what’s coming in this market. Open source and data analytics have a bright future together, and enterprise IT departments can capitalize on it.
For more information on open source analytics and how Hadoop technology can add value to your business, read this white paper.