The Hortonworks Blog

Another important milestone for Apache Pig was reached this week with the release of Pig 0.10. The purpose of this blog is to summarize the new features in Pig 0.10.

Boolean Data Type

Pig 0.10 introduces boolean data type as a first-class Pig data type. Users can use the keyword “boolean” anywhere where a data type is expected, such as load-as clause, type cast clause, etc.

Here are some sample use cases:

a = load ‘input’ as (a0:boolean, a1:tuple(a10:boolean, a11:int), a2);

b = foreach a generate a0, a1, (boolean)a2;

c = group b by a2; — group by a boolean field

When loading boolean data using PigStorage, Pig expects the text “true” (ignore case) for a true value, and “false” (ignore case) for a false value; while other values map to null.…

This blog covers our on-going work on Snapshots in Apache Hadoop HDFS. In this blog, I will cover the motivations for the work, a high level design and some of the design choices we made. Having seen snapshots in use with various filesystems, I believe that adding snapshots to Apache Hadoop will be hugely valuable to the Hadoop community. With luck this work will be available to Hadoop users in late 2012 or 2013.…

We just released the second video in the Hortonworks Executive Series. This one features Matt Foley, Test and Release Engineering Manager for Hortonworks.

In this video, Matt provides an overview of Hortonworks Data Platform (HDP), including a summary of the Apache Hadoop components included in the distribution and the testing processes involved in the release process. Matt also provides an overview of Apache Ambari, an open source project that is adding monitoring and management capabilities to Apache Hadoop.…

We are pleased to support today’s announcement from Citrix that they have contributed CloudStack to the Apache community. For those new to CloudStack, it is an open source cloud computing software that helps organizations build and manage cloud infrastructures. It is similar to Amazon Web Services EC2 environment except that it enables organizations to build public, private or hybrid cloud environments using their own pooled computing resources.

Citrix announced today that they were reaffirming their commitment to open source by working with the Apache Software Foundation to make CloudStack 3 an Apache project, released under Apache Software License 2.0.…

I’m pleased to announce the first in a series of videos featuring Hortonworks founders and executives sharing their thoughts on how Apache Hadoop is being extended to become the next generation enterprise data platform. Over the coming weeks and months, you will be hearing from folks such as Matt Foley, Arun Murthy, Sanjay Radia and Alan Gates, just to name a few.

The first video features Shaun Connolly, Hortonworks VP of Corporate Strategy, talking about the Hortonworks vision for Apache Hadoop.…

Thank you to the community members that cast over 8,000 votes during the Hadoop Summit Community Choice voting process. The turnout far exceeded our expectations and is further evidence that the momentum behind Apache Hadoop has never been stronger.

As we announced, the sessions with the most votes in each track are automatically accepted into the Hadoop Summit agenda. As such, I am pleased to announce the winners of the Hadoop Summit Community Choice vote and the first confirmed sessions in the Hadoop Summit program:

Future of Apache Hadoop track: Dynamic Namespace Partitioning with Giraffa File System, Konstantin Shvachko (eBay)

Deployment and Operations track: Dynamic Reconfiguration of Apache Zookeeper, Alexander Shraer and Benjamin Reed (Yahoo!)

Enterprise Data Architecture track: iMStor: Hadoop Storage-based Tiering Platform, Vishal Malik (Cognizant Technology Solutions)

Applications and Data Science track: Hadoop & Cloud @Netflix: Taming the Social Data Firehose, Mohammad Sabah (Netflix)

Analytics and Business Intelligence track: Mapping and Reducing Passenger Turbulence using Big Data, Farhan Hussain and Saad Patel (Open Source Architect)

Hadoop in Action track: The Merchant Lookup Service at Intuit, Vrushali Channapattan (Intuit)…

As I first mentioned when we announced Hadoop Summit 2012, we are focused on making Hadoop Summit the preeminent conference for the Apache Hadoop community. Today I’m pleased to tell you about Community Choice, a public online voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for Hadoop Summit. Anybody can vote and the top vote getters in each track will automatically be included in the Hadoop Summit agenda.…

We reached a significant milestone in HDFS: the Namenode HA branch was merged into the trunk. With this merge, HDFS trunk now supports HOT failover.

Significant enhancements were completed to make HOT Failover work:

  • Configuration changes for HA
  • Notion of active and standby states were added to the Namenode
  • Client-side redirection
  • Standby processing journal from Active
  • Dual block reports to Active and Standby

We have extensively tested HOT manual failover in our labs over the last few months.…

Today we announced an important strategic partnership with Talend, provider of the world’s most popular open source data integration platform. This is another win for both Hortonworks customers and the larger Apache Hadoop community. There were two key aspects of the announcement that I wanted to highlight:

Talend releases Talend Open Studio for Big Data

Based upon Talend’s very popular open source data integration platform, Talend Open Studio for Big Data adds connectors for HDFS, HBase, Pig, Sqoop and Hive.…

Today we announced  that we were delivering on our earlier promise to help Microsoft bring Apache Hadoop to Windows. I’m pleased to share that Microsoft, with our collaboration and guidance, has now submitted a series of patches to Apache aimed at overcoming the challenges of running Apache Hadoop in Windows Server environments.

These patches, once vetted and approved by the community, will become part of the core Hadoop code base. They will also become available in the two major Apache Hadoop branches: hadoop-1.0 (the current stable branch, which is available as part of Hortonworks Data Platform v1.0) and hadoop-0.23 (the next generation of Apache Hadoop, which will be available as part of Hortonworks Data Platform v2.0).…

A very short while ago, Vinod blogged about some of the significant improvements in Hadoop.Next (a.k.a hadoop-0.23.1).

To recap, the Hortonworks and Yahoo! teams have done a huge amount of work to test, validate and benchmark Hadoop.Next, the next generation of Apache Hadoop that includes HDFS Federation, NextGen MapReduce (a.k.a. YARN) and many other significant features and performance improvements.

Today, I’m very excited to announce that the Apache Hadoop community voted to release hadoop-0.23.1 and it’s now available for all to use!…

Hortonworks and Teradata announced a strategic relationship today that includes joint go-to-market and development work to more closely integrate Hortonworks Data Platform with the Teradata Analytical Ecosystem. I wanted to take the opportunity to highlight this important partnership and share my thoughts on why this is an important milestone for Hortonworks and the larger Apache Hadoop community.

As somebody that has been heavily involved in the development of Apache Hadoop for six years and counting, it’s personally exciting to see Hadoop entering a new phase of adoption.…

I’ve been surprised by a couple of recent articles highlighting our recent leadership change.  These articles imply that our business model may be changing. Let me be clear, WE ARE NOT CHANGING OUR BUSINESS MODEL. We are committed to providing training and support of a 100% open source distribution of Apache Hadoop and related projects.

What has changed?

Rob Bearden has agreed to take on the role of CEO. I am moving from CEO to the role of CTO.…

One of the common themes that we hear from customers, partners, industry analysts and others in the community is that there is massive need for Apache Hadoop education. The demand for trained and certified Hadoop professionals far exceeds the current supply and this knowledge gap is threatening to slow the rapid adoption of Hadoop. To address this challenge, Hortonworks is pleased to announce Hortonworks University.

Hortonworks University consists of public, private on-site and live online courses for both developers and administrators.…

Hadoop RPC is the primary communication mechanism between the nodes in an Apache Hadoop cluster. Maintaining wire compatibility, as new features are added to Apache Hadoop, has been a significant challenge with the current RPC architecture. In this blog, I highlight the architectural improvement in Hadoop RPC and how it enables wire compatibility and rolling upgrades.

Challenges for Wire Compatibility

Earlier Hadoop RPC used Writable serialization that made it difficult to evolve the protocols while maintaining wire compatibility.…

Go to page:« First...102030...3839404142...Last »