The Hortonworks Blog

Posts categorized by : Apache Hadoop

When I first started to understand what YARN is, I wanted to build an application to understand its core. There was already a great example YARN application called Distributed Shell that I could use as the shell (pun intended) for my experiment. Now I just needed an existing application that could provide massive reuse value by other applications. I looked around and I decided on MemcacheD.

This brief guide shows how to get MemcacheD up and running on YARN – MOYA if you will…

Prerequisites

You’re going to need a few things to get the sample application operational.…

We had a lot of fun in NYC and hope you did too. Thanks to the hundreds of you who dropped by the booth, attended dinners, parties, meetups and sessions.

As we have known for some time, Hortonworks customers are already building a modern data architecture with Hadoop as the technology of choice for handling the data they have streaming in from all directions. They care that it matches their needs, integrates with their existing infrastructure and solves real problems with flexibility.…

One of the great things about working in open source development is working with other experts round the work on big projects – and then having the results of that work in the hands of users within a short period of time.

This is why I’m really excited about the Rackspace announcement of their HDP-based Big Data offerings, both “on-prem” and in cloud. Not just because its partners of us offering a service based on Hadoop, but because it shows how Hadoop integration with OpenStack has reached a point where it’s ready for production use.…

The Apache Knox community announced the release of the Apache Knox Gateway (Incubator) 0.3.0. We, at Hortonworks, are excited about this announcement.

The Apache Knox Gateway is a REST API Gateway for Hadoop with a focus on enterprise security integration.  It provides a simple and extensible model for securing access to Hadoop core and ecosystem REST APIs.

Apache Knox provides pluggable authentication to LDAP and trusted identity providers as well as service level authorization and more.  …

With the attention of the Hadoop community on Strata/Hadoop World in New York this week, it’s seems an appropriate time to give everyone an early update on continued community development of Apache Hive. This progress well and truly cements Hive as the standard open-source SQL solution for the Apache Hadoop ecosystem for not just extremely large-scale, batch queries but also for low-latency, human-interactive queries.

You can catch me at our session ‘Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop’ along with Owen and Alan where we’ll be happy to dive into more of the details.…

Last week we announced the availability of the Hortonworks Data Platform 2.0. Today, we’re delighted to announce the availability of the Hortonworks Sandbox 2.0.

New Features

  • Based on HDP 2.0
  • Easy enablement of Ambari and Hbase
  • Updated tutorial navigation

HDP 2.0

This version of the Sandbox provides you a complete HDP 2.0 environment. Your own personal single-node Hadoop cluster where you can explore the new features and enhancements of HDP 2.0, including YARN, the improvements to Hive that were delivered by the Stinger initiative, along with the updates to Hbase, Pig, and Ambari.In fact, our Sandbox has all of the most current releases of the various Apache Projects — like Hive 12, HBase 96, and Hadoop 2.2.…

You’re a Java developer, you use Spring and you’re just itching to get your arms around some big data. Well, now you can do that even easier than before as we announced this morning that Spring is now certified for Hortonworks Data Platform.

To celebrate this development, we have a community tutorial for Sandbox (1.3 currently) that shows you how to use Spring XD to collect data streamed from Twitter, load into HDFS and then run simple sentiment analysis with Apache Hive.…

Today our partner Rackspace announced their Big Data solution for dedicated and cloud environments, powered by Hortonworks Data Platform. This collaboration between Hortonworks and Rackspace provides customers a flexible choice of deployment offerings of Apache Hadoop from one of the most trusted vendors in the cloud computing market.

Enterprise adoption of Apache Hadoop

This expanded collaboration is a strong indicator of the ecosystem rallying around Hortonworks Data Platform and our goal at Hortonworks of making Apache Hadoop a core component of the modern data architecture, whether on premise, in a VM, as an appliance, or in the cloud.…

I’d like to take a quick moment to welcome Julian Hyde as the latest addition to the Hortonworks engineering team. Julian has a long history of working on data platforms, including development of SQL engines at Oracle, Broadbase, and SQLstream. He was also the architect and primary developer of the Mondrian OLAP engine, part of the Pentaho BI suite.

Julian’s latest role has been as the author and architect of the Optiq project – an Apache licensed open source framework.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Mahadev Konar discusses Apache Ambari, the open source Apache project to simplify management of a Hadoop cluster.

Mahadev was on the team at Yahoo! in 2006 that started developing what became Apache Hadoop. Since then, he has also held leadership positions in the Apache Zookeeper and Apache Ambari projects. He is an architect and project management committee member for Apache Ambari, Apache ZooKeeper and Apache Hadoop.…

What a difference a year makes! Last Fall Ambari was a nascent Apache project that had recently shipped an inaugural release in the community. Fast forward a bit, at the beginning of this year Ambari shipped what has become the foundation for rapid innovation. Now Ambari has become a key member of the Apache Hadoop project ecosystem and a trusted operational platform for many companies.

Let’s take a brief look at the community’s amazing accomplishments over the past year, and then take some time to look forward.…

The Hadoop Distributed File System is the reliable and scalable data core of the Hortonworks Data Platform. In HDP 2.0, YARN + HDFS combine to form the distributed operating system for your Data Platform, providing resource management and scalable data storage to the next generation of analytical applications.

Over the past six months, HDFS has introduced a slew of major features to HDFS covering Enterprise Multi-tenancy, Business Continuity Processing and Enterprise Integration:

  • Enabled automated failover with a hot standby and full stack resiliency for the NameNode master service
  • Added enterprise standard NFS read/write access to HDFS
  • Enabled point in time recovery with Snapshots in HDFS
  • Wire Encryption for HDFS Data Transfer Protocol

Looking forward, there are evolving patterns in Data Center infrastructure and Analytical applications that are driving the evolution of HDFS.…

This post is the sixth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

Motivation

Tez follows the traditional Hadoop model of dividing a job into individual tasks, all of which are run as processes via YARN, on the users’ behalf – for isolation, among other reasons.…

Today, with overwhelming partner support, we announced GA of Hortonworks Data Platform 2.0 (HDP 2.0).  With 17 certified partners and many more in the works, organizations can confidently get started taking advantage of Hadoop 2.0 its YARN based architecture knowing that the technologies they rely on, run on HDP 2.0.

With a YARN-based architecture that serves as the operating system for Hadoop, HDP 2.0 takes Hadoop beyond single-use, batch processing to a fully functional,  multi-use platform that enables batch, interactive, online and stream processing.…

Typical delivery of enterprise software involves a very controlled date with a secret roadmap designed to wow prospects, customers, press and analysts…or at least that is the way it usually works.  Open source, however, changes this equation.

As described here, the vision for extending Hadoop beyond its batch-only roots in support of interactive and real-time workloads was set by Arun Murthy back in 2008. The initiation of YARN, the key technology for enabling this vision, started in earnest in 2011, was declared GA by the community in the recent Apache Hadoop 2.2 release, and is now delivered for mainstream enterprises and the broader commercial ecosystem with the release of Hortonworks Data Platform 2.0.…

Go to page:« First...89101112...20...Last »