The Hortonworks Blog

More from Justin Sears

Leading enterprise organizations have concluded that YARN-enabled Hadoop is foundational to their modern data architectures. These companies subscribe with Hortonworks (and implement Hortonworks Data Platform) to bring additional types of data under management, merge those with legacy datasets, and unlock new business insight.

But don’t take our word for it.

Watch these brief videos and hear our customers describe how a data-first approach is transforming their businesses.


Luminar is the leading big data analytics and modeling provider uniquely focused on delivering actionable insights on U.S.…

The public sector is charged with protecting citizens, responding to constituents, providing services and maintaining infrastructure. In many instances, the demands of these responsibilities increase while government resources simultaneously shrink under budget pressures.

How can Intelligence, Defense and Civilian agencies do more with less?

Apache Hadoop is part of the answer. Within the public sector, Hadoop delivers data-driven actions in support of IT efficiency and good government.

Download the White Paper

In one example, the United States Internal Revenue Service had to reduce its auditor headcount due to budget cuts.…

Last week Hortonworks presented the second of our eight Discover HDP 2.2 webinars. Alan Gates and Raj Bains discussed the initiative and new Apache Hive features for speed, scale and SQL that are included in Hortonworks Data Platform 2.2.

After an overview of HDP 2.2, Alan discussed what the Apache community accomplished with the original Stinger initiative and how that momentum continues in

Alan and Raj then discussed details on three areas of innovation currently underway in the Apache Hive project:

  • For SQL – transaction with ACID semantics
  • For Speed – the cost based optimizer
  • For Scale – dynamic query optimization

Here is the complete recording of the webinar

Here is the presentation deck.…

Last week Hortonworks presented the first of 8 Discover HDP 2.2 webinars: Comprehensive Hadoop Security with Apache Ranger and Apache Knox. Vinay Shukla and Balaji Ganesan hosted this first webinar in the series.

Balaji discussed how to use Apache Ranger (for centralized security administration, to set up authorization policies, and to monitor user activity with auditing. He also covered Ranger innovations now included in HDP 2.2:

  • Support for Apache Knox and Apache Storm, for centralized authorization and auditing
  • Deeper integration of Ranger with the Apache Hadoop stack with support for local grant/revoke in HDFS and HBase
  • Ranger’s enterprise readiness, with the introduction of REST APIs for policy management, and scalable storage of audit in HDFS

Vinay presented Apache Knox and API security for Apache Hadoop.…

Last week’s release of Hortonworks Data Platform 2.2 is packed with countless new features for Enterprise Hadoop. These included the results of Hortonworks investment in VERTICAL integration with YARN and HDFS and also HORIZONTAL innovation to ensure the key enterprise services of governance, security, and operations can be applied consistently and reliably across all the components within the Apache Hadoop platform.

To guide you through these capabilities, Hortonworks is hosting a new series of eight Thursday webinars beginning on October 23 and running to December 18.…

Last week’s Hortonworks webinar “What’s Possible with a Modern Data Architecture?” featured Greg Girard, program director for omni-channel analytics strategies at IDC Retail Insights and Mark Ledbetter, vice president for industry solutions at Hortonworks. Greg provides targeted, fact-based guidance to retailers for the application of analytics across the enterprise. Mark has more than twenty-five years experience in the software industry with a focus on retail and supply chains.

Many of Greg and Mark’s thoughts from the webinar echo topics also covered in the recent Hortonworks white paper “The Retail Sector Boosts Sales with Hadoop.”

Download White Paper

Greg discussed the most significant drivers of big data initiatives in the retail industry, including customer acquisition, pricing strategies or competitive intelligence.…

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.…

Modern retailers collect data from a multitude of consumer engagement channels, including point of sale systems, the web, mobile applications, social media, and more. They hope to use this data to derive greater customer insights, promote increased brand engagement and loyalty, optimize pricing and promotions, streamline the supply chain, and enhance their business models.

Data from the retailer’s transactional systems has historically been stored in an enterprise data warehouse (EDW) or other database, but these traditional data repositories are not well suited for the newer, unstructured data types like log files, social media updates and information from in-store sensors.…

This summer, Hortonworks presented the Discover HDP 2.1 Webinar series. Our developers and product managers highlighted the latest innovations in Apache Hadoop and related Apache projects.

We’re grateful to the more than 1,000 attendees whose questions added rich interaction to the pre-planned presentations and demos.

For those of you that missed one of the 30-minute webinars (or those that want to review one they joined live), you can find recordings of all sessions on our What’s New in 2.1 page.…

Few industries depend as heavily on data as financial services. Insurance companies, retail and investment banks aggregate, price and distribute capital with the aim of increasing their return on assets with an acceptable level of risk.

To do that, financial decision-makers need data. Apache Hadoop helps them store new data sources, then process the larger combined dataset for batch, interactive and real-time analysis. More data and better analysis improves bottom-line results.…

The world’s top telecommunications firms adopt Hadoop to gain competitive advantage and to respond to technology-driven changes like increases in both network traffic and the telemetry data captured by network sensors.

The majority of North America’s and Europe’s telcos have chosen Hortonworks Data Platform (HDP) to meet these challenges. Read the new Hortonworks white paper for a detailed discussion of twenty-one common telco and cable company use cases.

Download the White Paper

With their Modern Data Architectures based on HDP, these firms improve efficiency and capture opportunities in some of these ways:

  • Analyze call detail records (CDRs).

SequenceIQ provides an API and platform to build predictive applications and turn data into tangible assets. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), explains why his team chose Apache Ambari for provisioning Hadoop clusters and how they contributed to the Ambari project.

At SequenceIQ, we frequently provision Hadoop clusters on different environments. For a long time, we searched for the right provisioning and management tool.…

Oscar Padilla, Vice President of Strategy at Luminar, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform, and Luminar’s journey to the Data Lake.

Luminar is the first big data analytics provider focused specifically on U.S. Latino consumers. Our company offers analysis based on empirical insights, rather than with a sample-based approach. Apache Hadoop and Hortonworks Data Platform (HDP) make this empirical approach work at scale.…

Customers’ Hadoop Journey

We’ve all had two weeks to reflect on Hadoop Summit 2014. One of the biggest differences that stood out in this year’s Summit (as compared to Summit 2013) was the presence of large enterprise customers that are using Apache Hadoop as an important part of their modern data architectures.

Hadoop has gone beyond its original Yahoo use case—indexing the web via a nightly batch MapReduce process —and into the mainstream of daily data processing and analytics with real-time, online, interactive, and batch applications at many notable companies.…

Big Data In Healthcare

Electronic data is the heartbeat in a healthcare provider’s office. ZirMed is a Hortonworks customer and a leading provider of healthcare information management solutions. Healthcare providers, including physicians, hospitals and large health systems, use the company’s cloud-based revenue cycle management offerings to manage the complex process of billing and collecting revenue from patients and payers.

ZirMed’s Analytics solution aggregates healthcare data and makes it available to its customers, so they get a clearer view of their financial and operational performance.…