From the Dev Team

Follow the latest developments from our technical team

HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked into the HDFS layer. HDFS is protected using Kerberos authentication, and authorization using POSIX style permissions/HDFS ACLs or using Apache Ranger.

Apache Ranger (http://hortonworks.com/hadoop/ranger/) is a centralized security administration solution  for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components.…

Hackathons, Hackfest, and Codefests have an initial air of invincibility. They challenge participants, even veterans—not if the attendees work together or if the community collaborates and innovates together. That air of invincibility quickly dissipates.

Last Saturday, because of such camaraderie and collaboration, a harmony of innovative ideas flourished and came to fruition at an Ambari Hackfest.

Open Data Platform Initiative (ODPi) founding partners Hortonworks and Pivotal co-hosted and co-sponsored an Ambari Hackfest at the Pivotal site near the scenic Foothills in Palo Alto.…

Geospatial data is pervasive—in mobile devices, sensors, logs, and wearables. This data’s spatial context is an important variable in many predictive analytics applications.

To benefit from spatial context in a predictive analytics application, we need to be able to parse geospatial datasets at scale, join them with target datasets that contain point in space information, and answer geometrical queries efficiently.

Unfortunately, if you are working with geospatial data and big data sets that need spatial context, there are limited open source tools that make it easy for you to parse and efficiently query spatial datasets at scale.…

In a world that creates 2.5 quintillion bytes of data every year, it is extremely cheap to collect, store and curate all the data you will ever care about. Data is de facto becoming the largest untapped asset. So how can organizations take advantage of unprecedented amounts of data? The answer is new innovations; and new applications. We are clearly entering a new era of modern data application

I would like to take the opportunity to share my Hadoop journey in the past 10 years, and discuss where I see the Hadoop technology going in the next decade.…

We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft Azure Gallery. Hortonworks Sandbox is already a very popular environment for developers, data scientists and administrators to learn and experiment with the latest innovations in Hortonworks Data Platform.

The hundreds of innovations span across Apache Hadoop, Kafka, Storm, Spark, Hive, Pig, YARN, Ambari, Falcon, Ranger and other components that make up HDP platform.…

Everyday more and more new devices—smartphones, sensors, wearables, tablets, home appliances—connect together by joining the “Internet of Things.” Cisco predicts that by 2020, there will be 50 billion devices connected to Internet of Things. Naturally, they all will emit streams of data, in short intervals. Obviously, these data streams will have to be stored, will have to be processed, and will have to be analyzed, in real-time.

Apache Storm is the scalable, fault-tolerant realtime distributed processing engine that allows you to the handle massive streams of data in realtime, in parallel, and at scale.…

Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we explain cloud provisioning, proactive support, and other general improvements across the platform.

Automated Provisioning with Cloudbreak

Since Hortonworks’ acquisition of SequenceIQ, the integrated team has been working hard to complete the deployment automation for public clouds including Microsoft Azure, Amazon EC2, and Google Cloud.…

On August 4th at 10:00 am PST, Eric Thorsen, General Manager Retail/CP at Hortonworks and Krishnan Parasuraman, VP Business Development at Splice Machine, will be talking about how Hadoop can be leveraged as a scale-out relational database to be the System of Record and power mission critical applications.

In this blog, they provide answers to some of the most frequently asked questions they have heard on the topic.

Register Now

  • Hadoop is primarily known for running batch based, analytic workloads.
  • On July 22nd, we introduced the general availability of HDP 2.3. In part 2 of this blog series, we explore notable improvements and features related to Data Access.

    We are especially excited about what these data access improvements mean for our Hortonworks subscribers.

    Russell Foltz-Smith, Vice President of Data Platform, at TrueCar summed up the data access impact to his business using earlier versions of HDP, and his enthusiasm for the innovation in this latest release:

    TrueCar is in the business of providing truth and transparency to all the parties in the car-buying process,” said Foltz-Smith.…

    We are very pleased to announce that Hortonworks Data Platform (HDP) Version 2.3 is now generally available for download. HDP 2.3 brings numerous enhancements across all elements of the platform spanning data access to security to governance. This version delivers a compelling new user experience, making it easier than ever before to “do Hadoop” and deliver transformational business outcomes with Open Enterprise Hadoop.

    As we announced at Hadoop Summit in San Jose, there are a number of significant innovations as part of this release including:

    HDP 2.3 represents the very latest innovation from across the Hadoop ecosystem.…

    Drink from Elephant’s Well Of Knowledge

    Developer success starts with open and reusable code, and a community that allows for both consumption of code and contribution of updates to the code base. This success engenders a thriving and evolving community.

    To that end, today we are announcing the Hortonworks Gallery for developers. Located on GitHub, the Gallery brings together the Hortonworks’ Apache Hadoop code, Apache Ambari Views and extensions, as well as related resources into a single view for developers to use within the familiar context of Git and open source software.…

    Early this year, ApacheTM FalconTM became a Top Level Project (TLP) in the Apache Software Foundation.

    The project continues to mature as a framework for simplifying and orchestrating data lifecycle management in Hadoop by offering out-of-the-box data management policies. The Apache Falcon 0.6.1 release builds on this foundation by providing simplified mirroring functionality and a new user interface (UI).

    The community worked very diligently to offer more than 150 product enhancements, and over 30 new features and improvements.…

    Hortonworks is always pleased to see new contributions come into the open-source community. We worked with our customer, Hotels.com, to help them develop libraries and utilities around Apache Hive, the Apache ORC format and Cascading. It’s great to see the results released for the community. In this guest blog, Adrian Woodhead, Big Data Engineering Team Lead at Hotels.com, discusses the CORC project.

    Hotels.com is pleased to announce the open source release of Corc, a library for reading and writing files in the Apache ORC file format using Cascading.…

    As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection.

    On June 10th, the community announced the release of Apache Ranger 0.5.0. With this release, the community took major steps to extend security coverage for Hadoop platform and deepen its existing security capabilities.…

    In his blog, Tim Hall wrote, “Enterprises are embracing Apache Hadoop to enable their modern data architectures and power new analytic applications. The freedom to choose the on-premises or cloud environments for Hadoop that best meets the business needs is a critical requirement.”

    One of the choices in deploying Hadoop in the cloud environment is with Microsoft Azure using Cloudbreak. Other choices include Google Cloud Platform, Openstack, and AWS.

    But in this blog, I’ll show how you can deploy Hadoop in Azure with few clicks by running HDP multimode cluster in Azure’s Linux VM using Cloudbreak.…