The Hortonworks Blog

Posts categorized by : Administrator

Hortonworks will be making a preview of Apache Storm integration available in Q4 of this year and will be including Apache Storm in the Hortonworks Data Platform in first half of 2014.

Any time now, the Apache Hadoop community will declare the General Availability of Hadoop 2.0 which includes the much anticipated Apache Hadoop YARN.  The YARN-based architecture of Hadoop 2 is the most significant change to Hadoop introduced in the past six years and enables Hadoop to expand from a single-purpose, batch-oriented data platform based on MapReduce into a truly multi-purpose platform supporting a wide range of data processing approaches.…

As part of a modern data architecture, Hadoop needs to be a good citizen and trusted as part of the heart of the business. This means it must provide for all the platform services and features that are expected of an enterprise data platform.

The Hadoop Distributed File System is the rock at the core of HDP and provides reliable, scalable access to data for all analytical processing needs. With HDP 2.0, built into the platform itself, HDFS now has automated failover with a hot standby, with full stack resiliency.…

Security is one of the biggest topics in Hadoop right now. Historically Hadoop has been a back-end system accessed only by a few specialists, but the clear trend is for companies to put data from Hadoop clusters in the hands of analysts, marketers, product managers or call center employees whose numbers could be in the hundreds or thousands. Data security and privacy controls are necessary before this transformation can occur. HDP2, through the next release of Apache Hive introduces a very important new security feature that allows you to encrypt the traffic that flows between Hadoop and popular analytics tools like Microstrategy, Tableau, Excel and others.…

On October 16, we’ve been invited to join our partner SAP to talk Big Data and how the integrated SAP HANA + Hadoop approach can solve your big data challenges. This chat will be a live Google Hangout with:

  • Irfan Khan, SVP & GM SAP Global Big Data at SAP (@i_kHANA)
  • Ari Zilka,  CTO at Hortonworks (@ikarzali)
  • Timo Elliot, Innovation Evangelist at SAP (@timoelliott)

When: Wednesday, October 16, 8am PT / 11am ET / 5pm CET…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Mahadev Konar discusses Apache ZooKeeper, the open source Apache project that is used to coordinate various processes on a Hadoop cluster (such as electing a leader between two processes).

Mahadev was on the team at Yahoo! in 2006 that started developing what became Apache Hadoop. He has been involved with Apache ZooKeeper since 2008, when the project was open sourced.…

A crucial requirement of any enterprise technology is to ensure simplest possible management and operation. We think that simplicity means two things: 1) integration with existing infrastructure and tools and 2) leveraging existing knowledge and skills.

Download the beta release of Ambari SCOM Management Pack here.

Ambari (http://incubator.apache.org/ambari/) was introduced as an Apache incubator project with the aim of developing the best management tool for Hadoop applying our principles of open source community development for rapid innovation and solving the right problems for enterprises.…

Thanks to all those who joined in person and virtually for the Apache Ambari Meetup at Hortonworks this week. We talked tech, we saw demos, we laughed, we cried, we ate pizza.

The central theme of the night was the newly added support for Hadoop 2. Ambari now has:

  • Hadoop 2 Stack: Ambari adds support for installing, managing and monitoring a Hadoop 2 Stack.
  • NameNode HA: Configure NameNode High Availability based on QJM support built-into HDFS2
  • YARN: Ambari manages YARN Service lifecycle and automatically deploys the MapReduce2 framework.

YARN and the Hortonworks Data Platform 2.0 enables one Hadoop cluster to share data and analytical processing capabilities across the Enterprise organization. Organizations can use the Hortonworks Data Platform 2.0 to:

  • Pool all enterprise data into one scalable and reliable storage platform
  • Enable all analytical processing IN the data platform
  • Provide access to this data and processing across all business units

The Capacity Scheduler (CS) ensures that groups of users and applications will get a guaranteed share of the cluster, while maximizing overall utilization of the cluster.…

There’s an old proverb you’ve likely heard about blind men trying to identify an elephant. Depending on the version of the proverb you’ve heard the elephant is misidentified variously as rope, walls, pillars, baskets, brushes and more. Oddly, no-one identified it as a next-generation enterprise data platform but I guess it is an old proverb.

The Hadoop elephant is a platform though, and as such the proverb holds true. Depending on your perspective, it has different capabilities, components and integration points to meet your requirements.…

In this post we’ll cover some new scheduling options available via Apache Oozie in HDP 2. You can try out these capabilities in HDP 2 Beta and HDP 2 Beta Sandbox.

What Is Oozie Again?

Apache Oozie is a workflow engine and scheduler for Hadoop. Oozie allows you to run jobs in Hadoop at pre-defined intervals. The jobs can be simple ones that execute single Hive or Pig commands or can be full directed acyclic graphs representing complex workflows.…

It’s not an easy task to find the right hardware configuration for Hadoop. Thanks to our partner Dell, we’ve detailed a configuration for Hortonworks Data Platform (HDP) on the Dell PowerEdge R720XD. This reference configuration introduces the server set-up that can run the HDP and is intended for organizations looking on configuring Apache Hadoop clusters within their information technology environment for big data analytics.

Download the reference here.

How big is big anyway? What sort of size and shape does a Hadoop cluster take?

These are great questions as you begin to plan a Hadoop implementation. Designing and sizing a cluster is complex and something our technical teams spend a lot of time working with customers on: from storage size to growth rates, from compression rates to cooling then there are many factors to take into account.

To make that a little more fun, we’ve built a cluster-size-o-tron which performs a more simplistic calculation based on some assumptions on node sizes and data payloads to give an indication of how big your particular big is.…

With HDP 1.3 and HDP 2.0 Beta, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors.

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

  • Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree
  • Scalable: Snapshots do not create extra copies of blocks on the file system.

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Venkat Ranganathan discusses using Apache Sqoop for bulk data movement between Hadoop and enterprise data stores. Sqoop can also move data the other way, from Hadoop into an EDW.

Venkat is a Hortonworks engineer and Apache Sqoop committer who wrote the connector between Sqoop and the Netezza data warehousing platform. He also worked with colleagues at Hortonworks and in the Apache community to improve integration between Sqoop and Apache HCatalog, delivered in Sqoop 1.4.4.…

As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment.…

Go to page:« First...45678