From the Dev Team

Follow the latest developments from our technical team

Enterprise Apache Hadoop provides the fundamental data services required to deploy into existing architectures. These include security, governance and operations services, in addition to Hadoop’s original core capabilities for data management and data access. This post focuses on recent work completed in the open source community to enhance the Hadoop security component, with encryption and SSL certificates.

Last year I wrote a blog summarizing wire encryption options in Hortonworks Data Platform (HDP).…

Introduction

Hortonworks University announces a new operationally focused course for Apache Hadoop administrators. This two-day training course is designed for Hadoop administrators who are familiar with administering other Hadoop distributions and are migrating to the Hortonworks Data Platform (HDP). Through a combination of lecture and hands-on exercises you will learn how to install, configure, maintain and scale an HDP cluster

Target Audience

This course is designed for experienced Hadoop administrators and operators who will be responsible for installing, configuring and supporting the Hortonworks Data Platform.…

Since its first deployment at Yahoo in 2006, HDFS has established itself as the defacto scalable, reliable and robust file system for Big Data. It has addressed several fundamental problems of distributed storage at unparalleled scales and with enterprise grade robustness.

As more and more enterprises adopt Apache Hadoop, it is becoming a unified central storage aka Data Lake for all kinds of enterprise data. Many of these storage use cases are for file storage for classic big data applications, where HDFS is the perfect fit.…

Computers are getting smarter and we are not.

–Tim Berners Lee, Web Developer

Google, Amazon and Netflix have conditioned us. As consumers, we expect intelligent applications that predict, suggest and anticipate our every move. We want them to sift through the millions of possibilities and suggest just a few that suit our needs. We want applications that take us on a personalized journey through a world of endless possibilities.

These personalized journeys require systems to store and make sense of huge data volumes in an acceptable amount of time.…

A panel of reviewers made up of InfoWorld Test Center editors and industry experts selected Apache Storm as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source Software every year. These Bossie awards celebrate game-changing open source software projects in different domains, and the panel selected Apache Storm in the Big Data Tools category.

This is the first year that a streaming computation framework has been selected in the Big Data category, which is a tribute to Apache Storm’s broad industry adoption and versatility. …

Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…

Internet of Things (IoT) Potential and Process

It may seem obvious (or inevitable), but many companies are embracing the Internet of Things (IoT)—and for good reasons, notes Forbes’ Mike Kavis. For one, McKinsey Global Institute reports that IoT business will reach $6.2 trillion in revenue by 2025. And second, more and more objects are becoming embedded with sensors that communicate real-time data to data centers’ networks for processing, explain McKinsey’s Chui, Loffler, and Roberts.…

On September 17, the Apache Software Foundation (ASF) voted to graduate Apache Storm to a top-level project (TLP). This represents a major step forward for the project and represents the momentum built by a broad community of developers from not only Hortonworks, but also Yahoo!, Alibaba, Twitter, Microsoft and many other companies.

What is Apache Storm and why is it useful?

Apache Storm is a distributed, fault tolerant, and highly scalable platform for processing streaming data.…

The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez.…

Summary

This blog covers how recent developments have made it easy to use ORCFile from Cascading or Apache Crunch and that doing so can accelerate data processing more than 5x. Code samples are provided so that you can start integrating ORCFile into your Cascading or Crunch projects today.

What are Cascading and Apache Crunch?

Cascading and Apache Crunch are high-level frameworks that make it easy to process large amounts of data in distributed clusters.…

Hortonworks is committed to collaborate with ISVs and partners to onboard their applications to YARN and Hadoop. As part of the YARN Webinar Series, we have introduced different methods to help you integrate your applications to YARN: Native YARN integration, Slider and Tez. As part of this series, we now offer the opportunity to learn Scalding, with guest speaker from Twitter, who will talk about simplifying application development on Apache Hadoop and YARN.…

StackIQ, a Hortonworks technology partner, offers a comprehensive software suite that automates the deployment, provisioning, and management of Big Infrastructure. In his second guest blog, Anoop Rajendra (@anoop_r), a Senior Software Developer at StackIQ, gives instructions for using StackIQ Comand Line Interface (CLI) to deploy a Hortonworks Data Platform (HDP) cluster.

In a previous blog post, we discussed how StackIQ’s Cluster Manager automates the installation and configuration of an Apache Ambari server.…

Speed, Scale, and SQL Semantics

Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.

It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop.…

Apache Ambari is an open operational framework to provision, manage and monitor Hadoop clusters. As Hadoop has grown from a single purpose (MapReduce) framework to an extensible multi-purpose compute platform, with Apache Hadoop YARN as its architectural center, Apache Ambari has marched hand-in-hand to meet the evolving operational needs of Enterprise Hadoop.

Enabling ecosystem integration has been a key thrust of recent innovations within the Apache Ambari community. Key developments including Stack Extensibility and Ambari Views allow Ambari to deploy and manage YARN enabled applications.…

In April of this year, Hortonworks, along with the broad Hadoop community delivered the final phase of the Stinger Initiative on schedule, completing the work to bring interactive SQL query to Apache Hive.  The original directive of Stinger was about advancing SQL capabilities at petabyte scale in pure open source. And over 13 months, 145 developers from 44 companies delivered exactly that, contributing over 390,000 lines of code to the Hive project alone.…

Go to page:12345...10...Last »