Hadoop Insights

News about Hadoop in the wild; how Hadoop is being used; how Hadoop can be used.

Opportunity abounds

According to the enterprise data usage experts at Appfluent, the typical Enterprise Data Warehouse (EDW) dedicates 70% of its storage volume to unused data and 55% of its processing capacity to low value ETL workloads. This represents a waste of what could otherwise be a high performance, finely tuned analytics and reporting environment that supports enterprise priorities. Even worse, EDW environments often cannot deal with the varied structures of new data sources that offer so much untapped value.…

Hortonworks subscribers across all major industries use Hortonworks Data Platform (HDP) to power advanced analytics applications for data discovery and predictive analytics. The insurance industry uses Hadoop to drive this type of innovation for usage-based insurance (UBI).

Cindy Maike is the GM for Insurance Solutions at Hortonworks, and later this month she will present on big data for UBI at Insurance Telematics Canada 2015. The conference begins on April 23rd in Toronto, and Cindy will present with Tammy Chen from Towers Watson and Gerry Lee from QA Consultants on “Data Makes the (UBI) World Go Round.”

Register for the Conference & Exhibition

Here’s a preview of what Cindy and the panel will discuss.…

As we are finalizing our preparations for what will surely be another successful Hadoop Summit Europe event, one thing has become unequivocally clear: the Hadoop challenge is no longer about acceptance. It’s no longer about adoption. It’s about Hadoop being pervasive. Hadoop is everywhere.

As Mike Gualtieri of Forrester wrote in a recent report:

Hadoop is a must-have for large enterprises

I couldn’t agree more with Mike’s assessment, and I encourage you to read the report: “Predictions 2015: Hadoop Will Become a Cornerstone of Your Business Technology Agenda”.…

The factory of the future will merge the virtual world with the real world.

Domhall Carroll, industry sector head of Siemens Ltd, made this point as he addressed the Engineers Ireland annual conference in May 2014. Carroll described prior industrial revolutions, leading up to today’s transformations that are creating a fourth industrial revolution, based on the use of cyber-physical systems. His remarks aligned with three ways that we see our manufacturing customers solve data challenges in Industry 4.0.…

Changes in technology and consumer expectations create new challenges for how retailers create a single view of their customers, predict consumer preferences and discover fine-grain detail to manage their supply chains. Of course, these have always been important goals for retailers, but it used to be easier to understand the market signals (because there were fewer of those) and also to choose the right actions to take (because there were fewer possibilities there too).…

Changes in technology and customer expectations create new challenges for how insurers engage their customers, manage risk information and control the rising frequency and severity of claims.

Carriers need to rethink traditional models for customer engagement. Advances in technology and the adoption of retail engagement models drive fundamental changes in how customers shop for and purchase insurance coverage. To engage with their customers, our insurance customers seek “omni-channel” insight and the ability to confidently recommend the next best action (NBA) to their customers.…

Hortonworks provides enterprise Hadoop for the telecommunications service provider, and Hortonworks Data Platform (HDP) is architected from the ground up with the centralized YARN-based architecture and core enterprise services for data governance, security and cluster operations that can revolutionize your telecommunications business.

As the originators of Hadoop, leaders in the developer community, and partners for your success, nobody is better to help you become a data-centric telecommunications enterprise.

Hortonworks supports most of the largest North American carriers.…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets enterprise end users since it brings the largest number of developers together in a way that enables innovation to happen far faster than any single vendor could achieve and in a way that is free of friction for the enterprise.…

The Beginning of our Oil and Gas Journey

Hortonworks began working with the Oil & Gas industry in November of 2013 and our involvement accelerated during a very busy 2014 campaign. Our momentum was set against a backdrop early in the year of milestones in drilling and production across unconventional shale plays in North America, along with with a number of acquisitions, mergers, and divestitures that continued to shape the industry landscape.…

The public sector is charged with protecting citizens, responding to constituents, providing services and maintaining infrastructure. In many instances, the demands of these responsibilities increase while government resources simultaneously shrink under budget pressures.

How can Intelligence, Defense and Civilian agencies do more with less?

Apache Hadoop is part of the answer. Within the public sector, Hadoop delivers data-driven actions in support of IT efficiency and good government.

Download the White Paper

In one example, the United States Internal Revenue Service had to reduce its auditor headcount due to budget cuts.…

Modern retailers collect data from a multitude of consumer engagement channels, including point of sale systems, the web, mobile applications, social media, and more. They hope to use this data to derive greater customer insights, promote increased brand engagement and loyalty, optimize pricing and promotions, streamline the supply chain, and enhance their business models.

Data from the retailer’s transactional systems has historically been stored in an enterprise data warehouse (EDW) or other database, but these traditional data repositories are not well suited for the newer, unstructured data types like log files, social media updates and information from in-store sensors.…

Merv Adrian couldn’t have said it better. In his blog post from the weekend, he continued in his quest to define Hadoop. And it is no easy quest as the components of, and evolution of, Hadoop is happening at a pace that is, frankly, astounding.

The continuous evolution of Hadoop has even given rise to sentiments such as ‘Is Hadoop dead? ‘ The answer to that question is YES. And NO.  …

We certainly live in interesting times. About 20 months ago, in an effort to find proprietary differentiation that could be used to monetize and lock in customers to their model, Cloudera unveiled Impala and at that time Mike Olson stated “Our view is that, long-term, this will supplant Hive”. Only 6 months ago in his Impala v Hive post, Olson defended his “decision to develop Impala from the ground up as a new project, rather than improving the existing Apache Hive project” stating “Put bluntly: We chose to build Impala because Hive is the wrong architecture for real-time distributed SQL processing.”

So, 20 months after abandoning Hive and repeated marketing attempts to throw Hive and many other SQL alternatives under the bus in lieu of their “better” approach, I’m certainly puzzled as Cloudera unveils their plan to enable Apache Hive to run on Apache Spark; please see HIVE-7292 for details.…

We’re finally catching our breath after a phenomenal Hadoop Summit event last week in San Jose.  Thank you to everyone that came to participate in the celebration of Hadoop advances and adoption—from many of the organizations that shared their Hadoop journey with us that fundamentally transformed their businesses, to those just getting started, to the huge ecosystem of vendors. It is amazing to be part of such a broad and deep community that is contributing to making the market for everyone.…