The Hortonworks Blog

More from Rohit Bakhshi

On October 15 we announced that we would support Apache Hadoop as an Infrastructure as a Service (IaaS) on Microsoft Azure. This made us the first Hadoop vendor to give customers and prospects access to that flexible and scalable cloud infrastructure for their big data deployments.

This guide walks you through using the Azure Gallery to quickly deploy Hortonworks Data Platform (HDP) clusters on Microsoft Azure IaaS.

What you need is:

  • A Microsoft Azure account
  • That’s it!

Installing the Hortonworks Data Platform 2.0 for Windows is straightforward. Lets take a look at how to install a one node cluster on your Windows Server 2012 R2 machine.

To start, download the HDP 2.0 for Windows package. The package is under 1 GB, and will take a few moments to download depending on your internet speed. Documentation for installing a single node instance is located here. This blog post will guide you through that instruction set to get you going with HDP 2.0 for Windows!…

In just a few years, interest in Hadoop has enjoyed a meteoric rise. It is everywhere… and it should be available everywhere.

Here at Hortonworks we have worked to provide the widest range of deployment options for Hadoop… from on-premises to the cloud, Linux and Windows, and from commodity server clusters to high-end appliances. Deployment options are critical to the adoption of Hadoop and a key factor to adoption.

Today, we add Ubuntu to the list of options we support for HDP 2.0.…

The Hadoop Distributed File System is the reliable and scalable data core of the Hortonworks Data Platform. In HDP 2.0, YARN + HDFS combine to form the distributed operating system for your Data Platform, providing resource management and scalable data storage to the next generation of analytical applications.

Over the past six months, HDFS has introduced a slew of major features to HDFS covering Enterprise Multi-tenancy, Business Continuity Processing and Enterprise Integration:

  • Enabled automated failover with a hot standby and full stack resiliency for the NameNode master service
  • Added enterprise standard NFS read/write access to HDFS
  • Enabled point in time recovery with Snapshots in HDFS
  • Wire Encryption for HDFS Data Transfer Protocol

Looking forward, there are evolving patterns in Data Center infrastructure and Analytical applications that are driving the evolution of HDFS.…

An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.

In this blog article, we’ll explore how HDP 2.0 stores and provides insight into the performance of a MapReduce job on YARN.

Change from MapReduce v1 and HDP 1.x

In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists.…

As part of a modern data architecture, Hadoop needs to be a good citizen and trusted as part of the heart of the business. This means it must provide for all the platform services and features that are expected of an enterprise data platform.

The Hadoop Distributed File System is the rock at the core of HDP and provides reliable, scalable access to data for all analytical processing needs. With HDP 2.0, built into the platform itself, HDFS now has automated failover with a hot standby, with full stack resiliency.…

YARN and the Hortonworks Data Platform 2.0 enables one Hadoop cluster to share data and analytical processing capabilities across the Enterprise organization. Organizations can use the Hortonworks Data Platform 2.0 to:

  • Pool all enterprise data into one scalable and reliable storage platform
  • Enable all analytical processing IN the data platform
  • Provide access to this data and processing across all business units

The Capacity Scheduler (CS) ensures that groups of users and applications will get a guaranteed share of the cluster, while maximizing overall utilization of the cluster.…

With HDP 1.3 and HDP 2.0 Beta, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors.

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

  • Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree
  • Scalable: Snapshots do not create extra copies of blocks on the file system.

As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment.…

Today we released the Hortonworks Data Platform 1.3 for Windows for Windows Server 2008 R2 and 2012. This is an exciting major update to the only Enterprise Hadoop distribution on Windows. In this blog post, I will discuss what’s new and how to get started.

 Enabling new data applications

This release brings component parity to the HDP Stack across all operating systems by adding the following components:

  • Apache HBase (0.94.6.1) is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS).

We are excited to announce today that Hortonworks is bringing Windows-based Hadoop Operational Management functionality via Management Packs for System Center. These management packs will enable users to deploy, manage and monitor Hortonworks Data Platform (HDP) for both Windows and Linux deployments. The new management packs for System Center will provide management and monitoring of Hadoop from a single System Center Operations Manager console, enabling customers to streamline operations and ensure quality of service levels.…

We are excited to release the Hortonworks Data Platform 1.1 for Windows as a Generally Available product. In this blog post, I’m going to outline how to get started with HDP 1.1 for Windows.

With HDP for Windows, you can deploy Apache Hadoop and the HDP stack of components natively on a Windows Server cluster. The HDP for Windows download includes an MSI and remote installation scripts. With these artifacts, you can setup a multi-node Hadoop cluster in either a Workgroup or Active Directory Domain networking configuration.…