The Hortonworks Blog

Today we are proud to announce the delivery of Apache Ambari 1.4.1. Ambari 1.4.1 combines many months of work in the community advancing the Ambari codebase. Over 760 JIRAs have been resolved since the Ambari 1.2.5 release. We would like to thank the nearly 40 engineers who contributed to help make this release possible.

Hello Hadoop 2, Meet Apache Ambari
The most important addition to Ambari 1.4.1 is support for installing, managing and monitoring a cluster based on the Hadoop 2 stack.…

The Hortonworks HBase team is excited to see HBase 96 released.  It represents a broad community effort and massive amount of work that has been building for more than a year.

HBase 96 closes out over 2000 issues (2134 Jira tickets to be exact) and it represented the collective work from a VERY active community. Kudos to everyone involved! As the authors in a recent Apache blog alluded to, the HBase community is very healthy and includes developers from many companies including Hortonworks, Yahoo!, Cloudera, Salesforce, eBay, Intel, and Facebook, just to name just a few.…

This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the 8th post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series: 

Introduction

In YARN, applications perform their work by running containers, which today map to processes on the underlying operating system.…

Today we announced the Analytics Advantage with Hadoop offering from SAS, Teradata and Hortonworks. The new offering leverages the capabilities for in-database data preparation, analytic model building and deployment and combines Teradata’s Appliance for SAS® High-Performance Analytics offering with the Teradata Appliance for Hadoop built on Hortonworks Data Platform. Using Teradata’s Unified Data Architecture (UDA), this high-speed integrated offering allows customers to discover, build and deploy analytic models across data stored in Teradata and Hadoop, promoting businesses’ ability to act upon analytic insights from any type of data across a seamless environment and faster than ever before.…

You did it! Last Sunday we challenged you to “Learn Hadoop in 7 days”. We hope that you have risen to the test and kept up with the tutorials we’ve posted each day through Twitter and Facebook. These tutorials should have helped you delve into:

By now, you should feel comfortable with Hadoop clickstream analysis, Hortonworks ODBC driver configuration, and many other important components of Hadoop.…

This post’s Principal Author: Ming Ma, Software Development Manager, eBay.  With contribution from Mayank Bansal (eBay), Devaraj Das (Hortonworks), Nicolas Liochon (Scaled Risk), Michael Weng (eBay), Ted Yu (Hortonworks), John Zhao (eBay)

eBay runs Apache Hadoop at extreme scale, with tens of petabytes of data. Hadoop was created for computing challenges like ours, and eBay runs some of the largest Hadoop clusters in existence.

Our business uses Apache HBase to deliver value to our customers in real-time and we are sensitive to any failures because prolonged recovery times significantly degrade site performance and result in material loss of revenue. …

Stinger is not a product.  Stinger is a broad community based initiative to bring interactive query at petabyte scale to Hadoop. And today, as representatives of this open, community led effort we are very proud to announce delivery of Apache Hive 0.12, which represents the critical second phase of this project!

Only five months in the making, Apache Hive 0.12 comprises over 420 closed JIRA tickets contributed by ten companies, with nearly 150 thousand lines of code! …

Today, we are pleased to announce our strategic alliance between Hortonworks and SAS. Through this alliance we are committing to expand the integration between the SAS business analytics and data management capabilities and the Hortonworks Data Platform (HDP).

By better integrating SAS Business Analytics and HDP, SAS users can easily incorporate Hadoop as a component of their data architecture to capture, process and analyze data of any type and scale. This allows businesses to leverage powerful SAS analytic and data management capabilities across massive data sets, including new data sources that previously could not be captured and analyzed.…

Designed for senior IT executives, IT architects, technology planners, and business technologists, Knowledgent’s three-day facilitated Big Data Immersion workshop recently held in New York City, provided participants with an intensive deep dive answering the big data questions:

  • Why Big Data? What are the issues that brought it all about?
  • Demystifying Big Data: How can Hadoop help with big data issues?
  • Implementation: How do I operationalize big data? How is big data analytics different?

An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.

In this blog article, we’ll explore how HDP 2.0 stores and provides insight into the performance of a MapReduce job on YARN.

Change from MapReduce v1 and HDP 1.x

In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week – as Hadoop 2 goes GAArun Murthy discusses his journey with Hadoop. The journey has taken Arun from developing Hadoop, to founding Hortonworks, to this week’s release of Hadoop 2, with its Yarn-based architecture.

Arun describes the difference between MapReduce and YARN, and how they are related in Hadoop 2 (and by extension in Hortonworks Data Platform v2).…

I’m thrilled to note that the Apache Hadoop community has declared Apache Hadoop 2.x as Generally Available with the release of hadoop-2.2.0!

This represents the realization of a massive effort by the entire Apache Hadoop community which started nearly 4 years to date, and we’re sure you’ll agree it’s cause for a big celebration. Equally, it’s a great credit to the Apache Software Foundation which provides an environment where contributors from various places and organizations can collaborate to achieve a goal which is as significant as Apache Hadoop v2.…

I’ve been sitting on this post for a while as Apache Hadoop 2 GA work was keeping me extremely busy. As they say, better late than never, so here we go – the slides are at the end of the post.

Three weeks ago, we had a Apache Hadoop YARN meetup at LinkedIn. Kind folks at LinkedIn had offered to host us in addition to talking about exciting projects like usage of YARN at LinkedIn, and applications on YARN like Apache Samza, Apache Giraph and Apache Helix.…

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform.…

Hortonworks will be making a preview of Apache Storm integration available in Q4 of this year and will be including Apache Storm in the Hortonworks Data Platform in first half of 2014.

Any time now, the Apache Hadoop community will declare the General Availability of Hadoop 2.0 which includes the much anticipated Apache Hadoop YARN.  The YARN-based architecture of Hadoop 2 is the most significant change to Hadoop introduced in the past six years and enables Hadoop to expand from a single-purpose, batch-oriented data platform based on MapReduce into a truly multi-purpose platform supporting a wide range of data processing approaches.…

Go to page:« First...10...1314151617...203040...Last »