The Hortonworks Blog

In this blog, I’ll cover how we tested Full Stack HA with NameNode HA in Hadooop 1 with Hadoop and HBase as components of the stack.

Yes, NameNode HA is finally available in the Hadoop 1 line. The test was done with Hadoop branch-1 and HBase-0.92.x on a cluster of roughly ten nodes. The aim was to try to keep a really busy HBase cluster up in the face of the cluster’s NameNode repeatedly going up and down.…

Visit Hortonworks at Strata New York!

We are so excited to attend O’Reilly Strata Conference in New York next week! If you are going to be there,  please come by booth 16 meet the members of the Hortonworks team who will be happy to discuss any questions you have about Hortonworks Data Platform, business benefits, see a nice demo and walk away with cool swags!

Hortonworks will also be participating in an array of sessions and meet-ups at this conference.…

This will be the first and the largest European conference focused exclusively on accelerating the enterprise adoption of Apache Hadoop. The event will be a gathering for the vibrant Apache Hadoop community of developers, data scientists, data professionals and solution providers and will be held at the historic Beurs van Berlage in Amsterdam on March 20-21, 2013.

Call for papers now open!

Apache Hadoop practitioners, enthusiasts and solution providers with an idea for a talk at the event, can submit your ideas now on the call for papers page.…

Introduction

The Apache Hadoop YARN meetup at Hortonworks on October 12, 2012 we previously announced was a resounding success. We had a very good turnout of around seventy people from the community.

Meetup sessions Deployments at Yahoo!

The meetup kicked off with YARN committers from Yahoo presenting on current Hadoop 2.0 deployments at Yahoo. As part of the presentation, the following were covered.

  • described scenarios where YARN positively advanced the state of the art like scalability, its current stability, the power of the YARN web-services, and its superlative performance compared to the previous versions.

Today our partner, Teradata, announced availability of the Teradata Aster Big Analytics Appliance, which packages our Hortonworks Data Platform (HDP) with Teradata Aster on machine that is ready to plug-in and bring big data value in hours.

There is more to this appliance than meets the eye…  it is not just a simple packaging of software on hardware. Teradata and Hortonworks engineers have been working together for months tying our solutions together and optimizing them for an appliance.…

Hortonworks sponsored the O’Reilly Strata conference in earlier this month at the Hilton Metropole in London. It was great meeting big data enthusiasts at the conference. We had fun giving away our little green mascot and came away pleasantly surprised at the state of interest in Big Data in the UK and Europe. There were over 500 attendees, which for a first time conference is a very good result. Conversations ranged from introductory “What is Apache Hadoop?” to deep discussions regarding how Hadoop was being used in production today.…

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.0.2-alpha.

This is the second (alpha) release of the next generation release of Apache Hadoop 2.x and comes with significant enhancements to both the major components of Hadoop:

  • HDFS HA has undergone significant enhancements since the previous release for NameNode High Availability
  • YARN has undergone significant testing and stabilization and validation as is been heavily battle-tested since the previous release.

We are very excited to announce the Alpha release of the Hortonworks Data Platform 2.0 (HDP 2.0 Alpha).

HDP 2.0 Alpha is built around Apache Hadoop 2.0, which improves availability of HDFS with High Availability for the NameNode along with several performance and reliability enhancements. Apache Hadoop 2.0 also significantly advances data processing in the Hadoop ecosystem with the introduction of YARN, a generic resource-management and application framework to support MapReduce and other paradigms such as real-time processing and graph processing.…

Back in June we joined Teradata Aster in a webcast “Back to the Future – MapReduce, Hadoop and the Data Scientist” to highlight the benefits of Apache Hadoop and the role that data scientists are playing in big data. You can check out the replay here. The discussion focused around how big data architectures could bring more value to businesses using relational DBMS technology and Hadoop, and how the two can coexist.…

Series Introduction

Packetloop CTO Michael Baker (@cloudjunky) made a big splash when he presented ‘Finding Needles in Haystacks (the Size of Countries)‘ at Blackhat Europe earlier this year. The paper outlines a toolkit based on Apache Pig, Packetpig @packetpig (available on github), for doing network security monitoring and intrusion detection analysis on full packet captures using Hadoop.

In this series of posts, we’re going to introduce Big Data Security and explore using Packetpig on real full packet captures to understand and analyze networks.…

We had another amazing turn out on our Ambari webinar with Matt Foley a couple of weeks back. This series was meant to educate Hadoop enthusiasts and help them gain better understanding of the value of Hadoop and I think we’re on the right track. If you missed or would like a refresher from our last two webinars (Pig and Ambari) you can find the recording here: https://hortonworks.com/webinars/

We’re starting the third installment of the “Future of Apache Hadoop” series next Wednesday on “Scaling Apache Zookeeper to the Next Generation Applications” with Mahadev Konar (@mahadevkonar) Hortonworks co-founder and core contributor and PMC member of the Apache Zookeeper.…

I spent some time at the first ever DataWeek in San Francisco last week.  It is a brand new show and it was very well-run, spread across a few cool spaces with an interesting mix of novice to experienced data professionals.  They had a good blend of labs, speakers, panels and great networking opportunities.  In all, it was great and a big thanks and kudos to the organizers.

I took part in a panel and also presented a three-hour overview of Hadoop. …

There will be a Pig meetup at Strata NYC/Hadoop World, at 6:30PM on Wed, Oct 24th in the Bryant Room of the Hilton New York. This will also be the inaugural meeting of the NYC Pig User Group, which Doug Daniels of Pig contributor Mortar Data was good enough to organize. We look forward to future Pig meetups in NYC!

Hortonworks’ own Daniel Dai @daijy, VP of Apache Pig, will present on new features in Pig 0.11.…

Hortonworks is hosting an Apache YARN Meetup on Friday, Oct 12, to solicit feedback on the YARN APIs. We’ve talked about YARN before in a four-part series on YARN, parts one, two, three and four.

YARN, or “Apache Hadoop NextGen MapReduce,” has come a long way this year. It is now a full-fledged sub-project of Apache Hadoop and has already been deployed on a massive 2,000 node cluster at Yahoo.…

In a recent post we used Pig to summarize documents via the Term-Frequency, Inverse Document Frequency (TF-IDF) algorithm.

In this post, we’re going to turn that code into a Pig macro that can be called in one line of code:

1 2 import 'tfidf.macro'; my_tf_idf_scores = tf_idf(id_body, 'message_id', 'body');

Our macro, in filename tfidf.macro looks just like our pig script, with a couple of new lines. Note the use of macro variables for input and output preceded with the ‘$’ character: $in_relation, $out_relation, $id_field and $text_field.…

Go to page:« First...1020...3132333435...40...Last »