The Hortonworks Blog

More from Marc Holmes

UPDATE: This cheat sheet was so popular, we’ve created a PDF of the content below so you can print it and use it more easily. Download here.

SQL to Hive Cheat Sheet from Hortonworks

If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.

From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started.…

If you want to understand the thinking in the various projects in the Hadoop ecosystem, then who better to talk to than key members of those projects – the committers.

In this video, we talk with Owen O’Malley, Hortonworks co-founder and Apache Hive committer, about the initiation of Hive, why it matters and future directions for the project.

Learn more about Hive here, or at the Apache Hive project site.…

A busy week at Hortonworks Towers means a quick recap on what’s been happening.

Hadoop on Windows. On Tuesday we announced the GA of HDP 1.3 for Windows. Apart from being the only native Windows distribution for Hadoop, the updates and innovation in this release bring it to parity with our Linux distribution which means Hadoop Everywhere! Later on, we talked about getting started with HDP 1.3 for Windows, and also pointed at some great resources and tutorials.…

If you’re a Microsoft developer and stepping into Hadoop for the first time with HDP for Windows, then we thought we’d highlight this fantastic resource from Rob Kerr, Chris Campbell and Garrett Edmondson :  the MSBIAcademy.

They’ve produced a high quality, practical series of videos covering anything from essential MapReduce concepts, to using .NET (in this case C#) to submit MapReduce jobs to HDInsight, to using Apache Pig for Web Log Analysis.…

Extracting insight from your machines, or customer sentiment data or any number of scenarios related to big data demands the integration of Hadoop into your data architecture to efficiently handle those new opportunities alongside the existing workloads. Over the next few months, we’re hosting a new webinar series along with partners to get to grips with what it means to integrate Hadoop into your data architecture.

The first three webinars in the series are listed below and ready for registration.…

Thanks to all who joined us for last week’s webinar on Apache Hadoop YARN: Enabling Next Generation Data Applications. You can listen to the full webinar replay here, and the slides are embedded below.

[slideshare id=25181351&doc=developingapplicationswithyarn-130812154114-phpapp01]

If you’re already diving into YARN, then we will be hosting the first  ‘Office Hours’ sessions at Hortonworks HQ. Join us on August 15th for a Deep Dive on Hoya (HBase on YARN). …

If you’re considering the WHY, the HOW and the WHAT of Hadoop and Big Data in your business, then this collection of papers and ebooks is your friend.

  • WHY does Hadoop matter? Our eBook “Disruptive Possibilities of Big Data” paints a picture of the future of the data-driven business and how it changes everything.
  • HOW does Hadoop work in my data architecture? As part of a modern data architecture, Hadoop sits alongside existing infrastructure and augments its capabilities through Refining and Exploring big datasets and ultimately enriching the application and customer experiences for your business.

BAM! What a week for Hadoop as we all spent time with around 2500 of our closest friends to spin some YARNs (I saw it over here first). Like me, you’re probably still digesting everything you heard but in the meantime here are some highlights from us.

Modern Data Architecture. Integrating Hadoop into existing data center investments is a hot topic for any enterprise thinking about Big Data. In support of that need there were some announcements with key data center partners:

The Hadoop goodness just keeps on flowing as we’ve delivered new releases and new content in the past 10 days. Let’s recap.

HDP 1.3 Release. This milestone release takes advantage of improved performance in Hive 0.11 along with delivery on a series of enterprise requirements including NFS access to HDFS, improved MTTR for HBase, business continuity through HDFS and HBase snapshots, optimized connectors to Oracle and Netezza and the latest release of Ambari for management and operations.…

Or as it’s more commonly being called: Week-ish in Review. Let’s recap on the latest – there’s some juicy technology goodness here.

Delivering on Stinger: Phase 1. Just this week, Hive 0.11 has been released. Owen (@owen_omalley) brought us the news that 55 – yes, fifty-five – developers from across the community have addressed 386 JIRA tickets and have delivered significant improvements to Hive along with an awesome demonstration of the power of community open-source development.…

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

  • Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive.

UPDATED 6/12: To include the Falcon meetup.

UPDATED: To include the Oozie meetup.

The main Hadoop Summit agenda is looking awesome – go take a look here, and register here – but there’s also a series of meetups planned for the day before the general sessions. If you want to get up close and personal on topics of interest to you with other like-minded folk then take a look at these options.…