Posts by Cheryle Custer:


Hortonworks Sandbox: Stinger, Visualizations and Virtualization

A couple of weeks ago, we releases several new Hadoop tutorials showcasing real-life uses cases and you can read about them here.Today, we’re delighted to bring to you the newest release of the Hortonworks Sandbox 1.3. The Hortonworks Sandbox allows you to go from Zero to Big Data in 15 Minutes through step-by-step hands-on Hadoop tutorials. The Sandbox is a fully functional single node personal Hadoop environment, where you can add your own data sets, validate your Hadoop use cases and build a small proof-of-concept.

The new release, posted today, contains a number of enhancements:

Hortonworks Data Platform 1.3

A new release of the Hortonworks Sandbox will always follow the new release of the Hortonworks Data Platform (HDP). A few weeks ago, we released the Hortonworks Data Platform 1.3  and you can read about it here. This includes our belief  in relentless march of community-driven open source as the fastest path to innovation and our contributions to speeding Hadoop queries through our improvements to Hive 0.11,  SQL-in-Hadoop, also known as the Stinger Initiative, which offers a 50x improvement in performance for queries.

Visualizations

New Visualization Functionality in Sandbox

New Visualization Functionality in Sandbox

We continue to improve the Sandbox user experience.

  1. With this release, we provide some basic visualizations for your Hive queries built in to the Sandbox. You can access this functionality the Hive interface, after you have run your query. You’ll see a new tab called “Visualizations”.  This new feature will help ensure that your basic queries are correct before you surface your Sandbox data in other tools.
  2. When importing data into the Sandbox, the column type (string, float, etc.) and delimiter type are auto-detected. You can always over-ride the default values.

New Virtualization Platforms

We’ve listened to your survey responses, tweets and emails and we’ve made some changes to make your experience better:

  1. HyperV Support. Running Windows 8 or Windows Server 2012 with a system that is enabled with virtualization support? We have a Sandbox for you!
  2. 32-bit Operating System Support. Have a Windows machine with a 32-bit OS? We’ve enabled the VirtualBox image to run on a 32-bit OS — including Windows 7, Windows 8 and Windows XP. The Sandbox still requires 4Gb of RAM and requires virtualization enabled on the BIOS but you should be able to run the Sandbox on these environments.
  3. Improved VirtualBox implementation. The VirtualBox implementation has been modified so that the set up and installation is much easier. You no longer have to configure two network adapters. Simply accept the default settings when you import the appliance.

What do you need to do to get these new features? Download the Sandbox!

Looking for interesting datasets to play with? Check out these datasets:

As always, we’re eager to hear your feedback and uses of the Sandbox.

Hadoop Tutorials: Real Life Use Cases in the Sandbox

One of the goals with the Hortonworks Sandbox is around showcasing end-to-end use cases for Hadoop. With the most current release of Hadoop tutorials, you’ll find 2 specific use cases highlighted both around utilizing clickstream data.   There are 6 new tutorials for you to walk through – Tutorials 6 – 11.

(Update: if your version of Sandbox does not have “Enable Ambari” on the introductory page, you will need to download the latest version of the Sandbox in order to have access to these tutorials.)

Clickstream Analysis – Website User Behavior

 

Hadoop Tutorials

Hadoop Tutorials in Hortonworks Sandbox

Tutorials 6-10 are extensive, step-by-step lessons to walk you through the process to connect the Sandbox to Excel 2013 via the Hortonworks ODBC driver to access and analyze semi-structured data (like Omniture logs). Here are some highlights of the new tutorials:

Tutorial 6 – Loading Data into the Hortonworks Sandbox

This covers the basics of brining data into the Sandbox. In this example, we’ve provided access to anonymized Omniture logs. But you can bring in your own data into the Sandbox – your own log data, twitter feeds, etc. The Sandbox is a fully functional personal Hadoop environment where you can add your own datasets to validate the Hadoop use cases in your environment.

Tutorials 7 & 11 – Installing the ODBC Driver in the Hortonworks Sandbox (Windows and Mac)

You can download the Hortonworks ODBC driver, connect it to the Sandbox and then use that connection with your favorite visualization or business intelligence tool? This tutorial will help you with the set up and connection. Once it’s set up, connect to Excel, Tableau, Alteryx, or any other business intelligence tool that supports ODBC.

Tutorials 8 & 9 – Accessing and Analyzing Data in Excel

Imagine being able to take that semi-structured data from Tutorial 6 and surface it in Excel. You’ll be able to do that on your own laptop when you follow the step-by-step lessons in Tutorials 8 & 9.

Hadoop Tutorials with Excel

Data visualization in Excel

Tutorial 10 – Visualizing Clickstream Data

Hadoop Tutorials

Combining CRM and weblog data

Here you will see another end-to-end example of visualizing clickstream data – but in this case weblog data is combined with CRM data to visualize actual customer behavior. This tutorial assumes that you’ve got the ODBC driver and Excel 2013 installed. Even if you don’t have Excel 2013, you can use your favorite visualization tool to play with the dataset.

Datasets

With these new tutorials, you can easily work with your own data within the Sandbox to start seeing where you can use the Hortonworks Data Platform within your organization to find insights into your own business. If you are looking for publicly available data to use with the Sandbox to apply these Hadoop tutorials against, here are some suggestions:

Ready to do work on your own real-life example? Download the Sandbox now.

Hortonworks Sandbox: Dreaming Up New Tutorials For You

We’re cooking up some new tutorials for you to play with in your Hortonworks Sandbox to help you learn more about the Hortonworks Data Platform, Apache Hadoop, Hive, Pig and HCatalog, with maybe a smattering of Mahout in there as well.

More about Sandbox »

While you’re anxiously awaiting, we thought we’d give you some pointers to some resources so that you can experiment and play. After all, that’s what a Sandbox is all about, right?

Language Manuals

First, if you’re looking to expand your skills, take a look at Hive Language Manual, the Pig Tutorial on the Apache Foundation website, and Command Line Interface information on HCatalog project incubator site.

Use Hive to SQLize

Feeling a bit more advanced? Take a look at Russell Jurney’s blog posts, HOWTO use Hive to SQLize your own Tweets Part 1, and HOWTO use Hive to SQLize your own Tweets Part 1.

Pull In Your Own Data

You have datasets. We know you want to put them in Hadoop. It’s easy. You can import your own data into the Sandbox the same way you imported data sets in the tutorials.

Looking for other interesting data sets? There are many interesting sets for you:

In the meantime, we’re working hard to bring you new and interesting tutorials. We’d love to see what you’ve done. Show us your demos and tutorials — who knows, there might be one of the coveted stuff elephants in your future!

Hadoop Training on Windows, in Europe and Palo Alto

Over the last several weeks, Hortonworks has made a number of announcements regarding the Hortonworks Data Platform (HDP), including the upcoming release of HDP on Windows, the only Apache Hadoop distribution available on Microsoft Windows. We’ve been busy expanding out Hadoop training offerings: we now offer classes for HDP on Windows, you can find training in Europe through our global training partners and you can join us for Apache Hadoop courses in our new corporate headquarters, where you can have lunch with one of the committers. We continue to believe that community-driven open source is the fastest way to innovation and our the classes we offer on Apache Hadoop provide the best way to build the skills that are in demand — not skills on proprietary implementations but rather skills on open source Apache Hadoop.

Hadoop Training for HDP on Windows

Hortonworks is the only Hadoop distribution to run on Windows and we’ve released the Understanding Apache Hadoop on Microsoft Windows course. This is a 4-day course that provides students with an overview of the Hortonworks Data Platform and is Understanding Apache Hadoop on Microsoft Windows. The course has a prerequisite of existing knowledge of the Visual Studio platform with C#, and SQL server and will be structured to provide you with about half of the class time working with hands on labs. This course will cover Hadoop installation on Windows, how data is ingested into Hadoop, and how data can be queried and visualized. Training classes start in April.

European Hadoop Training Options

We’ve expanded our global footprint with training partners.

The Elephant House

Hortonworks HeadquartersFinally, we’re very excited to be in our new corporate headquarters where we have a great training room. If the sound of Northern California in the early spring sounds good to you (can you say daytime temperatures in the high 60s?) then be sure to register for an Apache Hadoop training class in our headquarters and enjoy our new home! Not only will you enjoy a stay in the elephant house, during the training class, we’ll invite one of our committers to have lunch with the group. There is nowhere else can you have such an small group session with one of the Apache community experts.

We’re curious, what locations would you like to see us deliver public courses on Apache Hadoop?

Sandbox – Your Personal Hadoop Environment Gets Better!

We are excited to tell you about the newest release of the Hortonworks Sandbox.

The Hortonworks Sandbox provides the fastest onramp to Apache Hadoop with an easy-to-use, integrated learning environment and a functional personal Hadoop environment. The Sandbox takes the complexity out of Hadoop installation and set up by providing a fully functional virtual image. If you are evaluating Apache Hadoop or need an easy way to prove out use cases then the Sandbox is for you. With the Sandbox, you don’t have to go through the work required to set up Hadoop cluster or to configure Hadoop. Simply download the virtual machine.  Zero to Big Data in 15 minutes!

Here are the key enhancements available now:

Apache Hadoop Essentials Classroom Material

Are you new to Hadoop and need the answer to “What is Apache Hadoop?” Then the Hadoop Essentials material is for you. It’s designed to help a non-technical user understand what all the hype is around Hadoop and Big Data. And, we give you some easy hands-on labs to use.

We have taken our one-day Hadoop Essentials course and included the lecture content and hands-on labs in the Sandbox. The newly released lecture material goes into depth on the use cases of Apache Hive, Apache Pig, and Apache HCatalog. When you download the newest version of the Sandbox, the lecture material will be included in the download. You’ll have everything you need to learn Hadoop on a train, plane or automobile, even if you don’t have an Internet connection. If you use the existing Sandbox, simply update the tutorials, and then you will be able to link out to the video content.

Apache Ambari

We have integrated Apache Ambari into this new release of Sandbox.  Apache Ambari provides a 100-percent open source and intuitive set of tools to monitor, manage and efficiently provision your Apache Hadoop cluster. Ambari simplifies the operation of Hadoop clusters into a single, cohesive data platform. There are additional memory requirements for the Sandbox with Ambari and so enabling Ambari is optional. There’s some simple set up required to enable it – you’ll find those instructions in the Sandbox.

Sandbox Ambari Instructions

Easy navigation into the Hortonworks Sandbox. Start with tutorials or jump straight into your personal Hadoop environment.

Sandbox Users Doing Cool Things

The Sandbox Forums are where you can find assistance with the Sandbox. You will also find users who are doing some interesting things with the Sandbox.

Ready to dive into Ambari and the new videos? Download now! We will continue to deliver interesting tutorials for you and Sandbox enhancements.

Doing More with the Hortonworks Sandbox

The Hortonworks Sandbox was recently introduced garnering incredibly positive response and feedback. We are as excited as you, and gratified that our goal providing the fastest onramp to Apache Hadoop has come to fruition. By providing a free, integrated learning environment along with a personal Hadoop environment, we are helping you gain those big data skills faster. Because of your feedback and demand for new tutorials, we are accelerating the release schedule for upcoming tutorials. We will continue to announce new tutorials via the Hortonworks blog, opt-in email and Twitter (@hortonworks).

When the new tutorials are ready, the update process is a simple with one click of a button. Simply go to the “About Hortonworks Sandbox” icon, and press the Update button. Your initial Sandbox virtual machine installation will remain and only the tutorials will be updated.

sandbox_screenshot2

 

One of the other requests you had is to have access to more interesting datasets, for you to experiment more with the Sandbox. First, we designed the Sandbox so that you can add your own data into the Sandbox. Since the Sandbox runs on your own system, you control who has access to that data. Second, if you want to play with external data sets, here are a few resources where you can find publicly available data:

In the meantime, if you haven’t yet downloaded or installed the Sandbox, we encourage you to take part in the excitement. Should you need assistance, please go to the Hortonworks Sandbox Forums. Please join us for the Sandbox Webinar on Tuesday, February 5 at 10 am PST. And finally, check back to learn more about the release of new tutorials.

Hortonworks Sandbox — the Fastest On Ramp to Apache Hadoop

Go from Zero to Big Data in 15 Minutes!

Today Hortonworks announced the availability of the Hortonworks Sandbox, an easy-to-use, flexible and comprehensive learning environment that will provide you with fastest on-ramp to learning and exploring enterprise Apache Hadoop.

The Hortonworks Sandbox is:

  • A free download
  • A complete, self contained virtual machine with Apache Hadoop pre-configured
  • A personal, portable and standalone Hadoop environment
  • A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop on your own

The Hortonworks Sandbox is designed to help close the gap between people wanting to learn and evaluate Hadoop, and the complexities of spinning up an evaluation cluster of Hadoop. The Hortonworks Sandbox provides a powerful combination of hands-on, step-by-step tutorials paired with an easy to use Web interface designed to lower the learning curve for people who just want to explore and evaluate Hadoop, as quickly as possible.

One of our key focus areas is enabling Hadoop as an enterprise-viable platform that is easy to use and consume by our customers and the broader ecosystem. Over the past year or so, we have seen the complex and disjointed experience people face trying to learn Hadoop, and with the Sandbox, it allows you to have the fastest onramp to Apache Hadoop. We want the Sandbox to deliver an integrated, easy-to-use, easily updateable learning environment. Ongoing updates to the tutorials are planned, delivering new, interesting hands-on exercises, exploring different features and use cases.

These tutorials are built based on the experience gained training thousands of people in our Hortonworks University Training classes. As we continue to build out the Sandbox, we will provide additional levels of sophistication – think of it as the Hadoop 101, 201 and 301 levels of learning. And, the process of updating the tutorials is easy through the click of the “Update” button, initiating a lightweight download of just the tutorial content.

The Sandbox is a single node implementation of the Hortonworks Data Platform (HDP) 1.2 that behaves just like a normal Hadoop environment, which allows you to add your own datasets in an isolated protected environment to evaluate the use of Hadoop in your own data architectures.

Use the Sandbox to:

  • Explore Hadoop on your own
  • Plan out the integration points of your proof of concept project
  • Prepare for a more complex pilot deployment

When you are ready, you can download and deploy the Hortonworks Data Platform with the confidence that you have thought through exactly how and where Hadoop can help.

What can you expect from us in the coming months with the Hortonworks Sandbox?

  1. Join us for a special launch webinar on February 5, “Go from Zero to Big Data in 15 Minutes“. I will be hosting this webinar with one of our awesome Solution Engineers who will give you a sneak peek at some cool use cases for the Sandbox.
  2. New tutorials released on roughly a monthly basis.
  3. Demos and exercises of the integration with the tools and applications from our eco-system partners like Teradata, Alteryx, Datameer, and Microsoft. How cool would it be to run Excel on top of a personal Hadoop environment?? Well, that’s coming, so check back often.

I’m excited that you will be able to go from Zero to Big Data in 15 Minutes in a simple, easy-to-use fashion. And, I’m eager to hear your feedback – please let me know what you think of the Sandbox, what kinds of tutorials you would like to see and I would love to hear about your creative uses of the Sandbox. Leave your comments on this blog, Tweet out using #hwsandbox, comment in the Sandbox Forum, or email. The Hortonworks Sandbox is free and available for download here.