The Hortonworks Blog

Posts categorized by : Apache Hadoop

Introduction

The Apache Hadoop YARN meetup at Hortonworks on October 12, 2012 we previously announced was a resounding success. We had a very good turnout of around seventy people from the community.

Meetup sessions
Deployments at Yahoo!

The meetup kicked off with YARN committers from Yahoo presenting on current Hadoop 2.0 deployments at Yahoo. As part of the presentation, the following were covered.

  • described scenarios where YARN positively advanced the state of the art like scalability, its current stability, the power of the YARN web-services, and its superlative performance compared to the previous versions.

Today our partner, Teradata, announced availability of the Teradata Aster Big Analytics Appliance, which packages our Hortonworks Data Platform (HDP) with Teradata Aster on machine that is ready to plug-in and bring big data value in hours.

There is more to this appliance than meets the eye…  it is not just a simple packaging of software on hardware. Teradata and Hortonworks engineers have been working together for months tying our solutions together and optimizing them for an appliance.…

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.0.2-alpha.

This is the second (alpha) release of the next generation release of Apache Hadoop 2.x and comes with significant enhancements to both the major components of Hadoop:

  • HDFS HA has undergone significant enhancements since the previous release for NameNode High Availability
  • YARN has undergone significant testing and stabilization and validation as is been heavily battle-tested since the previous release.

We are very excited to announce the Alpha release of the Hortonworks Data Platform 2.0 (HDP 2.0 Alpha).

HDP 2.0 Alpha is built around Apache Hadoop 2.0, which improves availability of HDFS with High Availability for the NameNode along with several performance and reliability enhancements. Apache Hadoop 2.0 also significantly advances data processing in the Hadoop ecosystem with the introduction of YARN, a generic resource-management and application framework to support MapReduce and other paradigms such as real-time processing and graph processing.…

Series Introduction

Packetloop CTO Michael Baker (@cloudjunky) made a big splash when he presented ‘Finding Needles in Haystacks (the Size of Countries)‘ at Blackhat Europe earlier this year. The paper outlines a toolkit based on Apache Pig, Packetpig @packetpig (available on github), for doing network security monitoring and intrusion detection analysis on full packet captures using Hadoop.

In this series of posts, we’re going to introduce Big Data Security and explore using Packetpig on real full packet captures to understand and analyze networks.…

We had another amazing turn out on our Ambari webinar with Matt Foley a couple of weeks back. This series was meant to educate Hadoop enthusiasts and help them gain better understanding of the value of Hadoop and I think we’re on the right track. If you missed or would like a refresher from our last two webinars (Pig and Ambari) you can find the recording here: https://hortonworks.com/webinars/

We’re starting the third installment of the “Future of Apache Hadoop” series next Wednesday on “Scaling Apache Zookeeper to the Next Generation Applications” with Mahadev Konar (@mahadevkonar) Hortonworks co-founder and core contributor and PMC member of the Apache Zookeeper.…

I spent some time at the first ever DataWeek in San Francisco last week.  It is a brand new show and it was very well-run, spread across a few cool spaces with an interesting mix of novice to experienced data professionals.  They had a good blend of labs, speakers, panels and great networking opportunities.  In all, it was great and a big thanks and kudos to the organizers.

I took part in a panel and also presented a three-hour overview of Hadoop. …

There will be a Pig meetup at Strata NYC/Hadoop World, at 6:30PM on Wed, Oct 24th in the Bryant Room of the Hilton New York. This will also be the inaugural meeting of the NYC Pig User Group, which Doug Daniels of Pig contributor Mortar Data was good enough to organize. We look forward to future Pig meetups in NYC!

Hortonworks’ own Daniel Dai @daijy, VP of Apache Pig, will present on new features in Pig 0.11.…

Hortonworks is hosting an Apache YARN Meetup on Friday, Oct 12, to solicit feedback on the YARN APIs. We’ve talked about YARN before in a four-part series on YARN, parts one, two, three and four.

YARN, or “Apache Hadoop NextGen MapReduce,” has come a long way this year. It is now a full-fledged sub-project of Apache Hadoop and has already been deployed on a massive 2,000 node cluster at Yahoo.…

In a recent post we used Pig to summarize documents via the Term-Frequency, Inverse Document Frequency (TF-IDF) algorithm.

In this post, we’re going to turn that code into a Pig macro that can be called in one line of code:

1
2
import ‘tfidf.macro’;
my_tf_idf_scores = tf_idf(id_body, ‘message_id’, ‘body’);

Our macro, in filename tfidf.macro looks just like our pig script, with a couple of new lines. Note the use of macro variables for input and output preceded with the ‘$’ character: $in_relation, $out_relation, $id_field and $text_field.…

Alan Gates presented HCatalog to the Chicago Hadoop User Group (CHUG) on 9/17/12. There was a great
turnout, and the strength of CHUG is evidence that Chicago is a Hadoop city. Below are some kind words from the host, Mark Slusar.

On 9/17/12, the Chicago Hadoop User Group (CHUG) was delighted to host Hortonworks Co-Founder Alan Gates to give an overview of HCatalog. In addition to downtown Chicago meetups, Allstate Insurance Company in Northbrook, IL hosts regular Chicago Hadoop User Group Meetups.…

The need for a ToJson EvalFunc

When integrating Pig with different NoSQL ‘databases,’ or when publishing data from Hadoop, it can be convenient to JSONize your data. Although Pig has JsonStorage, there hasn’t been a ToJson EvalFunc. This has been inconvenient, as in our post about Pig and ElasticSearch, such that for creating JSON for ElasticSearch to index, tricks like this were necessary:…

1
2
3
4
5
6
store enron_emails into ‘/tmp/enron_emails_elastic’ using JsonStorage();
json_emails = load ‘/tmp/enron_emails_elastic’ AS (json_record:chararray);
 
/* Now we can store our email json data to elasticsearch for indexing with message_id.

As the Hadoop ecosystem has exploded into many projects, searching for the right answers when questions arise can be a challenge. Thats why I was thrilled to hear about search-hadoop.com, from Sematext. It has a sister site called search-lucene where you can… search lucene!

Search-Hadoop.com searches across projects – JIRAs, source code, mailing lists, wikis, etc. so you can see design and API docs, as well as questions, answers and general documentation.…

Apache ZooKeeper release 3.4.4 is now available. This is a bug fix release including 50 bug fixes. Following is a summary of the critical issues fixed in the release.

ZOOKEEPER-1419 Leader Election never settles for a 5 node cluster

ZOOKEEPER-1489 Data loss after truncate on transaction log

ZOOKEEPER-1412 java client watches inconsistently triggered on reconnect

ZOOKEEPER-1344 ZooKeeper client multi-update command is not considering the
Chroot request

ZOOKEEPER-1496 Ephemeral node not getting cleared even after client has exited

ZOOKEEPER-1437 Client uses session before SASL authentication complete

Stability of 3.4.4

As you might have noticed we have been marking all the previous 3.4.* releases as Alpha and beta.…

I hope you had fun pigging out to Hadoop with Alan Gates. We had interesting questions during the webinar and as always, your participation in these discussions will help us understand different use cases of Apache Pig and the growing community around this project. The recording is now available on our webinar site.

For the next installation of “Future of Apache Hadoop” webinar series, I would like to introduce to you Matt Foley and Ambari. …

Go to page:« First...10...1920212223...Last »