cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

This April, Hortonworks launched a multi-phase initiative to streamline Apache Hadoop operations, and the 1.3 release of SmartSense marks the delivery of the second phase of that initiative, and that is to provide Consolidated Cluster Activity Reporting. Hortonworks launched SmartSense in 2015 to help customers quickly collect cluster configuration, metrics, and logs to proactively detect […]

 

Hadoop All Grown Up

It’s amazing the growth Apache Hadoop and the extended ecosystem has had in the last 10 years. I read through Owen’s “Ten Years of Herding Elephants” blog and downloaded the early docker image of his first patch.  It reminded me of the days it took me to do my first Hadoop install and the effort […]

Yahoo! JAPAN needed a data platform that could scale to generate 100,000 reports per day as well as having the ability to process large amounts of data. It needed to keep the last 13 months’ worth of data, which is approximately 500 billion rows, organized and easily accessible. Relational Database Management Systems (RDBMS) cannot scale […]

Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, is our guest blogger. In this blog, explains how Informatica’s Big Data Edition integrates with Tez and allow for significant performance gains. Informatica Big Data Edition’s codeless visual development environment accelerates the ability of enterprises to take advantage of amazing innovations in big […]

Kristen Hardwick, Vice President of Big Data Solutions at Spry, Inc is our guest blogger. In this blog, Kristen shares performance analysis during Spryinc’s evaluation of Apache Hive with Tez as a fast query engine. In early 2014, Spry developed a solution that heavily utilized Hive for data transformations. When the project was complete, three […]

Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern […]

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment. Tez View: Understand and Optimize Jobs in your Cluster The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify […]

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user. Join us Thursday March […]

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster. HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to […]

As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines. Until recently, optimizing Hive queries focused mostly on data layout techniques such as partitioning and bucketing or using custom file formats. In the last couple of years, driven largely by the innovation of […]

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate. The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known […]

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs. HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of […]

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most […]

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs. HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of […]

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular: Improving Spark for Data Pipelines with Native YARN Integration. Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable […]