Getting Started with Hadoop on Windows with HDP 1.3

Today we released the Hortonworks Data Platform 1.3 for Windows for Windows Server 2008 R2 and 2012. This is an exciting major update to the only Enterprise Hadoop distribution on Windows. In this blog post, I will discuss what’s new and how to get started.

 Enabling new data applications

This release brings component parity to the HDP Stack across all operating systems by adding the following components:

  • Apache HBase (0.94.6.1) is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS). The addition of HBase enables online applications to persist and read large scale data in real time.
  • Apache Mahout (0.7.0) is a scalable machine learning library. Mahout enables data scientists to create and execute machine learning algorithms that run on Hadoop.
  • Apache Flume (1.3.1) is a service for streaming logs into Hadoop. Flume enables the collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS).

Also new in HDP 1.3 for Windows is Apache Hive 0.11. Hive 0.11 delivers a 50x improvement in performance for queries and broadens the range of SQL semantics supported in Hadoop as part of the Stinger Initiative. Use Excel to connect to Hive through our ODBC connector and channel these performance improvements through Excel queries.

If you want to see these capabilities in action, then you can watch one of five videos we have exploring the use of Hadoop across types of big data such as sensor data, sentiment data, web clickstreamgeolocation data and server logs.

Getting Started

The best way to get started and evaluate HDP 1.3 for Windows is to set up a single node cluster. We’ve written a quick start guide that walks you through all the pre-requisites and install steps needed to get going. With a single node cluster, you can experience the full functionality of the product and develop against the new functionality added in this release.

HDP enables seamless integration with the Microsoft BI tool ecosystem. You can explore data in HDFS through Microsoft Power Query for Excel. You can query and analyze Hive data in Excel by using the ODBC driver add-on to connect to Hive Server 2. You can import/export data from and to SQL Server through Apache Sqoop.

Depending on your role, you might also find these resources useful:

  • Developers and Data Analysts. You can get hands on with HDP 1.3 in the Hortonworks Sandbox which is a single-node cluster residing in a VM. There are a number of tutorials in the sandbox designed to help you get started with Hadoop.
  • System Administrators. You’ll be pleased to know that HDP for Windows will integrate with System Center. Find out more about that here.

Learn More. Please take a look at the Hortonworks Documentation to learn more about installing and using HDP 1.3 for Windows.

Tell Us About It. Please visit the HDP for Windows Forum to ask questions, get help, provide feedback and hear what others are doing with HDP.

Categorized by :
Apache Hadoop HDP HDP for Windows

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.