The Hortonworks Blog

Posts categorized by : Hadoop
Introduction

Hortonworks University announces a new operationally focused course for Apache Hadoop administrators. This two-day training course is designed for Hadoop administrators who are familiar with administering other Hadoop distributions and are migrating to the Hortonworks Data Platform (HDP). Through a combination of lecture and hands-on exercises you will learn how to install, configure, maintain and scale an HDP cluster

Target Audience

This course is designed for experienced Hadoop administrators and operators who will be responsible for installing, configuring and supporting the Hortonworks Data Platform.…

Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the last six months of innovation and includes more than a hundred new features and closes thousands of issues across Apache Hadoop and its related projects.

Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform — spanning Governance, Security and Operations.…

More and more enterprises are looking to the cloud as a place to handle a variety of their data processing and backup needs. Apache Hadoop lends itself to running in cloud environments because of the alignment around scalability and flexibility for compute and storage offered with today’s cloud infrastructures. Today, we are excited to announce that the Hortonworks Data Platform (HDP) is the first platform to be certified to run on Azure Infrastructure as a Service.…

Apache Hadoop has taken a mission critical role in the Modern Data Architecture (MDA) with the advent of Apache Hadoop YARN. YARN has enabled enterprises to store and process data across many execution engines at a scale that has not been possible earlier. This in turn has made security a crucial component of enterprise Hadoop. At Hortonworks we have broken the problem of enterprise security into four key areas of focus: authentication, authorization, auditing and data protection.…

Since its first deployment at Yahoo in 2006, HDFS has established itself as the defacto scalable, reliable and robust file system for Big Data. It has addressed several fundamental problems of distributed storage at unparalleled scales and with enterprise grade robustness.

As more and more enterprises adopt Apache Hadoop, it is becoming a unified central storage aka Data Lake for all kinds of enterprise data. Many of these storage use cases are for file storage for classic big data applications, where HDFS is the perfect fit.…

Computers are getting smarter and we are not.

–Tim Berners Lee, Web Developer

Google, Amazon and Netflix have conditioned us. As consumers, we expect intelligent applications that predict, suggest and anticipate our every move. We want them to sift through the millions of possibilities and suggest just a few that suit our needs. We want applications that take us on a personalized journey through a world of endless possibilities.

These personalized journeys require systems to store and make sense of huge data volumes in an acceptable amount of time.…

Hortonworks and VMware have been working jointly for more than two years. We worked with VMware on the initial launch of Serengeti, on Apache Hadoop High Availability and on projects to do with validating and performance testing the Hortonworks Data Platform (HDP) software on the VMware vSphere platform. One of the results of this activity is that HDP has been a fully certified product on VMware vSphere version 5.1 and later.…

SequenceIQ is a new Hortonworks Technology Partner and recently achieved HDP and YARN Ready certification for Cloudbreak, the SequenceIQs Hadoop as a Service API. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), describes provisioning and autoscaling HDP cluster with Cloudbreak.

During our daily work at SequenceIQ, we are provisioning HDP clusters on different environments. Be it for a random cloud provider or on bare metal, we were looking for a common solution to automate and speed up the process.…

Heading to Strata next week? Interested in learning more about Apache Hadoop and how to integrate your existing infrastructure and applications with your Big Data solution? Join our partner presentations at the Hortonworks booth #117, on Thursday, October 16 and Friday, October 17. Many of the sessions will feature demonstrations.

You will hear directly from partners who are embracing 100% open source Apache Hadoop, including: Actian, CISCO, CSC, HP, Informatica, Microsoft, Red Hat, Revolution Analytics, SAP, SAS, and Teradata.…

Syncsort is a certified Hortonworks Technology and YARN Ready Partner and our guest blogger. Here, Tendu Yogurtcu, vice president of engineering at Syncsort, expands on Syncsort’s recent news about their integration of DMX-h and Ambari.

As Apache Hadoop YARN has transformed Hadoop from being a data processing solution to being a true data processing platform, requirements for provisioning, managing, and securing the platform have changed dramatically.

Stability, security, easy deployment, performance, management and monitoring are among many of the key attributes that make a data management platform enterprise-grade.…

A panel of reviewers made up of InfoWorld Test Center editors and industry experts selected Apache Storm as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source Software every year. These Bossie awards celebrate game-changing open source software projects in different domains, and the panel selected Apache Storm in the Big Data Tools category.

This is the first year that a streaming computation framework has been selected in the Big Data category, which is a tribute to Apache Storm’s broad industry adoption and versatility. …

Since our founding over three years ago, a core part of our strategy has been on enabling the enterprise to use Hadoop in the context of their existing technologies via a Modern Data Architecture. From the earliest days of the company when we hired Mitch Ferguson to head our business development efforts, we’ve been working closely with data center ecosystem leaders, large and small, to integrate Hadoop so that it can take it’s place in the next generation data architecture.…

Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…

Last week’s Hortonworks webinar “What’s Possible with a Modern Data Architecture?” featured Greg Girard, program director for omni-channel analytics strategies at IDC Retail Insights and Mark Ledbetter, vice president for industry solutions at Hortonworks. Greg provides targeted, fact-based guidance to retailers for the application of analytics across the enterprise. Mark has more than twenty-five years experience in the software industry with a focus on retail and supply chains.

Many of Greg and Mark’s thoughts from the webinar echo topics also covered in the recent Hortonworks white paper “The Retail Sector Boosts Sales with Hadoop.”

Download White Paper

Greg discussed the most significant drivers of big data initiatives in the retail industry, including customer acquisition, pricing strategies or competitive intelligence.…

At Hortonworks, we are always watching emerging trends in the datacenter to find opportunities for deeper ecosystem integration with Apache Hadoop in simple and intuitive ways. We first partnered with OpenShift by Red Hat earlier this year when we made it possible to call out to Hadoop services from OpenShift via cartridges. You can read more about that solution here. As Enterprise Cloud (e.g. PaaS) offerings have matured to support a broad set of workloads, we’ve had a number of our customers ask about how Hadoop-centered Big Data and PaaS initiatives could work together – particularly in light of Apache Hadoop YARN being the multi-workload resource manager for batch, interactive and real-time workloads on Hadoop.…