Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
July 24, 2017 | Matt Spillar | Hortonworks Case Study

Don’t Leave Your Customers out in the Cold

July 21, 2017 | Tom Hastain

Join the Big Data Revolution! (Apply Inside)

July 20, 2017 | Anna Yong | Announcements

What Does Hortonworks SmartSense Mean To You?

Viewing posts: From the Dev Team« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

In his blog, Tim Hall wrote, “Enterprises are embracing Apache Hadoop to enable their modern data architectures and power new analytic applications. The freedom to choose the on-premises or cloud environments for Hadoop that best meets the business needs is a critical requirement.” One of the choices in deploying Hadoop in the cloud environment is with Microsoft Azure using […]

Mayank Bansal, of EBay, is a guest contributing author of this collaborative blog. This is the 4th post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of Apache Hadoop YARN in HDP. Background  In Hadoop YARN’s […]

Introduction Multihoming is the practice of connecting a host to more than a single network. This is frequently used to provide network-level fault tolerance – if hosts are able to communicate on more than one network, the failure of one network will not render the hosts inaccessible. There are other use cases for multi-homing as […]

The Apache community released Apache Pig 0.15.0 last week. Although there are many new features in Apache Pig 0.15.0, we would like to highlight two major improvements: Pig on Tez enhancements Using Hive UDFs inside Pig Below are some details about these important features. For the complete list of features, improvements, and bug fixes, please […]

The components in a modern data architecture vary from one enterprise to the next and the mix changes over time. Many of our Hortonworks subscribers need support ensuring that their Hortonworks Data Platform (HDP) clusters are optimally configured. This means that they need proactive, intelligent cluster analysis. As businesses onboard new workloads to the platform, […]

Apache Hadoop has emerged as a critical data platform to deliver business insights hidden in big data. As a relatively new technology, system administrators hold Hadoop to higher security standards. There are several reasons for this scrutiny: External ecosystem that comprise of data repositories and operational systems that feed Hadoop deployments are highly dynamic and […]

Last week, the Apache Slider community released Apache Slider 0.80.0. Although there are many new features in Slider 0.80.0, few innovations are particularly notable: Containerized application onboarding Seamless zero-downtime application upgrade Adding co-processors to app packages without reinstallation Simplified application onboarding without any packaging requirement Below are some details about these important features. For the […]

Not a day passes without someone tweeting or re-tweeting a blog on the virtues of Apache Spark. At a Memorial Day BBQ, an old friend proclaimed: “Spark is the new rub, just as Java was two decades ago. It’s a developers’ delight.” Spark as a distributed data processing and computing platform offers much of what […]

Apache Spark provides a lot of valuable tools for data science. With our release of Apache Spark 1.3.1 Technical Preview, the powerful Data Frame API is available on HDP. Data scientists use data exploration and visualization to help frame the question and fine tune the learning. Apache Zeppelin helps with this. Based on the concept […]

The Apache Accumulo community has announced its 1.7.0 release. As community’s first major release of 2015, the release represents the culmination of a year of effort from many Accumulo committers and contributors. Apart from many notable changes enumerated below, Accumulo is now well integrated with Apache Ambari. In this release, 43 different individuals fixed 691 […]

SQL is the most popular use case for the Hadoop user community, and Apache Hive is still the defacto standard. Early this week, the Apache Hive community released Apache Hive 1.2.0. Already the third release this year, the Hive developer community continues to improve the release and grow its team, with 11 Hive contributors promoted […]

This is the third post in a series that explores the theme of enabling diverse workloads in YARN.  Our introductory post  to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2, and a related post on CPU scheduling. Introduction One of the core responsibilities of YARN is monitoring and […]

Kristen Hardwick, Vice President of Big Data Solutions at Spry, Inc is our guest blogger. In this blog, Kristen shares performance analysis during Spryinc’s evaluation of Apache Hive with Tez as a fast query engine. In early 2014, Spry developed a solution that heavily utilized Hive for data transformations. When the project was complete, three […]

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures […]

This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2. Introduction When it comes to managing resources in YARN, there are two aspects that we, […]