Getting Started with Hadoop on Windows with HDP 1.3

Today we released the Hortonworks Data Platform 1.3 for Windows for Windows Server 2008 R2 and 2012. This is an exciting major update to the only Enterprise Hadoop distribution on Windows. In this blog post, I will discuss what’s new and how to get started.

 Enabling new data applications

This release brings component parity to the HDP Stack across all operating systems by adding the following components:

  • Apache HBase (0.94.6.1) is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS). The addition of HBase enables online applications to persist and read large scale data in real time.
  • Apache Mahout (0.7.0) is a scalable machine learning library. Mahout enables data scientists to create and execute machine learning algorithms that run on Hadoop.
  • Apache Flume (1.3.1) is a service for streaming logs into Hadoop. Flume enables the collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS).

Also new in HDP 1.3 for Windows is Apache Hive 0.11. Hive 0.11 delivers a 50x improvement in performance for queries and broadens the range of SQL semantics supported in Hadoop as part of the Stinger Initiative. Use Excel to connect to Hive through our ODBC connector and channel these performance improvements through Excel queries.

If you want to see these capabilities in action, then you can watch one of five videos we have exploring the use of Hadoop across types of big data such as sensor data, sentiment data, web clickstreamgeolocation data and server logs.

Getting Started

The best way to get started and evaluate HDP 1.3 for Windows is to set up a single node cluster. We’ve written a quick start guide that walks you through all the pre-requisites and install steps needed to get going. With a single node cluster, you can experience the full functionality of the product and develop against the new functionality added in this release.

HDP enables seamless integration with the Microsoft BI tool ecosystem. You can explore data in HDFS through Microsoft Power Query for Excel. You can query and analyze Hive data in Excel by using the ODBC driver add-on to connect to Hive Server 2. You can import/export data from and to SQL Server through Apache Sqoop.

Depending on your role, you might also find these resources useful:

  • Developers and Data Analysts. You can get hands on with HDP 1.3 in the Hortonworks Sandbox which is a single-node cluster residing in a VM. There are a number of tutorials in the sandbox designed to help you get started with Hadoop.
  • System Administrators. You’ll be pleased to know that HDP for Windows will integrate with System Center. Find out more about that here.

Learn More. Please take a look at the Hortonworks Documentation to learn more about installing and using HDP 1.3 for Windows.

Tell Us About It. Please visit the HDP for Windows Forum to ask questions, get help, provide feedback and hear what others are doing with HDP.

Categorized by :
Apache Hadoop HDP HDP for Windows

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Using Ambari for Management
Thursday, September 4, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.