Developer Tools

Support & Enable a Vibrant Ecosystem of Hadoop Developers

Developers are responsible for building data driven apps in, on and around Apache Hadoop, and expect a vibrant and powerful set of tools, frameworks and interfaces to simplify this task. They are focused on delivering on the value of an application and do not want to be mired in the mechanical details of integration with Hadoop.

Recently, purpose-built application development frameworks, such as Cascading have been created and existing frameworks such as Java and .NET have been extended to accommodate the Hadoop community.  We work to support all of these frameworks, so that we can empower a world of Hadoop application development.

Cascading Development Framework

Cascading is a  application development framework for building data applications. Acting as an abstraction layer, Cascading does the heavy lifting and converts your applications built on Cascading into MapReduce and Tez jobs that run effectively on top of Hadoop.

The Cascading SDK provides a collection of tools, documentation, libraries, tutorials and example projects from the greater Cascading community and enables the rapid development of batch and interactive data-driven applications.

  • Lingual. Simplifies systems integration through ANSI SQL compatibility and a JDBC driver
  • Pattern. Enables various machine learning scoring algorithms through PMML compatibility
  • Scalding. Enables development with Scala, a powerful language for solving functional problems
  • Cascalog and PigPen: Enable development with Clojure, a Lisp dialect


Integration with HDP allows Cascading to take advantage of advances in interactive applications provided by YARN.  Cascading is certified and supported by Hortonworks and backed by Concurrent.

Additional Resources

Microsoft .NET SDK for Hadoop

The Microsoft .NET SDK for Hadoop provides API access to HDP and Microsoft HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.

You can access the Microsoft .NET SDK for Hadoop here.

Additional Resources

Java and the Spring XD Framework

Spring for Apache Hadoop (SHDP)  a consistent configuration and API across a wide range of Hadoop ecosystem projects such as Pig, Hive, and Cascading in addition to providing extensions to Spring Batch for orchestrating Hadoop based workflows. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration.

SHDP, together with Spring Integration, Spring Batch and Spring Data are part of the Spring IO Platform as foundational libraries.  Building on top of, and extending this foundation, the Spring IO platform provides a big data runtime named Spring XD (XD = eXtreme Data).  Spring XD provides a single platform that  addresses common use cases in big data solutions – without the need to write code – just by using a domain specific language (DSL).  These use cases include data ingestion from external sources, data transformation and real-time analytics, data import/export to/from HDFS, and workflow orchestration.


These foundational parts of Spring IO platform make Hadoop development more accessible to a wider range of Java developers – including the massive Spring developer community – and make the process even faster for Hadoop experts.

Additional Resources


Hortonworks has been supporting application developers since the beginning of the company. We work with the community to produce a standard set of APIs for developers to use to create their applications. Our most recent release includes both those development APIs and integration into developer tools to speed creation of new applications.

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.