The Hortonworks Blog

Posts categorized by : Hadoop

In this guest blog, Kumar Srivastava, senior director of product management at ClearStory Data, shares his thoughts on ClearStory’s integration with Hortonworks Data Platform (HDP)

We are excited to be working with and announcing ClearStory Data’s integration with Hortonworks Data Platform (HDP) during Strata + Hadoop World 2015. This partnership with Hortonworks is significant as it brings ClearStory’s business-ready, fast-cycle, scalable analysis on Hadoop Data Lakes and specifically on the Hortonworks Data Platform (HDP).…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

Today we’re excited to be jointly announcing with EMC that the Isilon OneFS file system has been certified to work with the Hortonworks Data Platform (HDP). Now Isilon customers who are looking for a robust, enterprise-ready, stable Apache Hadoop platform can use HDP on their Isilon implementations.

Joint Engineering Delivering Choice

We’re excited to see the results of the months of engineering and testing efforts that now provide customers even greater deployment choice for their Hadoop projects as they are implementing a modern data architecture towards a data lake.…

Today Microsoft announced two important new updates to their Azure HDInsight Service with Apache Hadoop 2.6, now available on new clusters.

We are excited to continue to work alongside Microsoft in expanding the deployment options to the Linux Operating System for managed Hadoop as a Service Azure HDInsight clusters. The HDInsight on Linux Preview leverages the completely open Apache Ambari framework to deploy, manage and monitor Hadoop clusters on premise or in the cloud.…

There are lots of ways to interact with Hortonworks at this weeks Strata +Hadoop World event.

Exhibitor Booth 1321

While at our booth you can talk with our experts and get the latest on Hortonworks, get an overview of Apache Hadoop or hear more about how we are helping organizations drive success with Hadoop. You can also get one of the popular Hortonworks elephants!

Passport Program

While at our booth you can pick up a Passport Card to that you can enter for a chance to win some great prizes from one of the 24!…

Hortonworks has expanded its certification program to create an industry-recognized certification program where individuals prove their Hadoop knowledge by performing hands-on tasks on a Hortonworks Data Platform (HDP) cluster, as opposed to answering multiple-choice questions. Hortonworks University will be offering three new certification exams:

  • HDP Certified Developer
  • HDP Certified Java Developer
  • HDP Certified Administrator

The HDP Certified Developer (HDPCD) exam is the first of our new hands-on, performance-based exams designed for Hadoop developers working with frameworks like Pig, Hive, Sqoop and Flume.…

Hortonworks Data Platform’s YARN-based architecture enables multiple applications to share a common cluster and data set while ensuring consistent levels of response made possible by a centralized architecture. Hortonworks led the efforts to on-board open source data processing engines, such as Apache Hive, HBase, Accumulo, Spark, Storm and others, on Apache Hadoop YARN.

In this blog, we will focus on one of those data processing engines—Apache Storm—and its relationship with Apache Kafka.…

By now, we have all heard about Big Data. However, approaches to derive value from the phenomenon vary greatly from one organization to another. While companies like Facebook, Google or Yahoo! were birthplaces of game-changing innovations, most corporations are still trying to figure out how to unlock the power of Big Data.

In this video series created in collaboration between Informatica and Hortonworks, two pioneers and leaders in the data space, you will hear about a wide range of topics addressed in simple business terms.…

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets enterprise end users since it brings the largest number of developers together in a way that enables innovation to happen far faster than any single vendor could achieve and in a way that is free of friction for the enterprise.…

As a core component of the Modern Data Architecture (MDA), organizations rely on the Hortonworks Data Platform (HDP) for their mission critical functions which demand high availability and performance. Key to these organizations is simplified and consistent Hadoop Operations.

Join us for this workshop where we’ll cover the operational concerns of System Administrators & DevOps Engineers including installation, configuration, maintenance, security and performance topics.

Key Highlights include:

  • Hardware Recommendation and Sizing
  • OS tuning guide
  • Rapid and consistent deployment of clusters using Apache Ambari Blueprints
  • Cluster setup validation
  • Multi-tenancy with YARN
  • Security
  • HA and Business Continuity

The workshop will be a combination of slides plus demonstrations of the code in action.…

Talend is a Hortonworks Certified Technology Partner, and our guest blogger today is Shawn James, director, big data business development, Talend. Shawn and Jim Walker, director of product marketing at Hortonworks, are our guest speakers in an upcoming webinar on Feb. 12th.

If you are a data scientist, MapReduce or Hadoop developer, you are in demand given the massive increase in data science-based projects. These projects are being driven by the private sector of course, but also by a public sector that is looking to tackle a new range of use cases using big data.…

In August 2009, the Facebook Data Infrastructure Team published a white paper that outlined a warehousing solution over Hadoop. They called it Hive. And since that time, this project has not only emerged as the defacto standard for SQL in Hadoop, but with the help of the Stinger initiative it has progressed from a batch only framework with limited SQL interface to a near SQL:2011 compliant, fully interactive SQL query engine.…

Informatica users leveraging HDP are now able to see a complete end-to-end visual data lineage map of everything done through the Informatica platform. In this blog post, Scott Hedrick, director Big Data Partnerships at Informatica, tells us more about end-to-end visual data lineage.

Hadoop adoption continues to accelerate within mainstream enterprise IT and, as always, organizations need the ability to govern their end-to-end data pipelines for compliance and visibility purposes. Working with Hortonworks, Informatica has extended the metadata management capabilities in Informatica Big Data Governance Edition to include data lineage visibility of data movement, transformation and cleansing beyond traditional systems to cover Apache Hadoop.…

DataTorrent is a Hortonworks Certified Technology Partner and YARN Ready, offering an enterprise class real-time streaming platform on Hadoop and Hortonworks Data Platform. Thomas Weise, principal architect at DataTorrent, is our guest blogger today.

A while ago, DataTorrent announced a new initiative to integrate Kafka and YARN under the KOYA project. KOYA was proposed as KAFKA-1754 and well received by the community.

Why KOYA?

Kafka is becoming increasingly popular as the data bus to move data in and out of Hadoop clusters.…

Big data and cloud computing are top priorities in enterprise IT today. Organizations are adopting these two disruptive technologies because of the promise of lower cost, flexibility, portability and ease of management.

Today’s blog is another in a series discussing Apache Hadoop in the cloud as a key deployment option. Our guest blogger today is Sean Anderson, Manager of Data Service at Rackspace, the managed cloud company.

In 2012, Rackspace and Hortonworks partnered to expand the capabilities of Enterprise HadoopTM to both public cloud utility services and private clouds utilizing the popular open-source cloud platform Openstack.…