Developers are responsible for building data driven apps in, on and around Apache Hadoop, and expect a vibrant and powerful set of tools, frameworks and interfaces to simplify this task. They are focused on delivering on the value of an application and do not want to be mired in the mechanical details of integration with Hadoop.
Recently, purpose-built application development frameworks, such as Cascading have been created and existing frameworks such as Java and .NET have been extended to accommodate the Hadoop community. We work to support all of these frameworks, so that we can empower a world of Hadoop application development.
Cascading Development Framework
Cascading is a application development framework for building data applications. Acting as an abstraction layer, Cascading does the heavy lifting and converts your applications built on Cascading into MapReduce and Tez jobs that run effectively on top of Hadoop.
The Cascading SDK provides a collection of tools, documentation, libraries, tutorials and example projects from the greater Cascading community and enables the rapid development of batch and interactive data-driven applications.
- Lingual. Simplifies systems integration through ANSI SQL compatibility and a JDBC driver
- Pattern. Enables various machine learning scoring algorithms through PMML compatibility
- Scalding. Enables development with Scala, a powerful language for solving functional problems
- Cascalog and PigPen: Enable development with Clojure, a Lisp dialect
Integration with HDP allows Cascading to take advantage of advances in interactive applications provided by YARN. Cascading is certified and supported by Hortonworks and backed by Concurrent.
- Cascading SDK: http://cascading.org
- Cascading ETL Tutorials: http://docs.cascading.org/tutorials/etl-log/
- Tutorial: WordCount with Cascading on Hortonworks Sandbox
- Tutorial: Log Parsing with Cascading on Hortonworks Sandbox
- Tutorial: Cascading Pattern in Hortonworks Sandbox
- Partnership: http://hortonworks.com/partner/concurrent/
Microsoft .NET SDK for Hadoop
The Microsoft .NET SDK for Hadoop provides API access to HDP and Microsoft HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.
You can access the Microsoft .NET SDK for Hadoop here.
- Training: Hadoop on Windows for Developers
- Partnership Overview: Hortonworks and Microsoft
- HDP for Windows: download HDP for Windows
- Access HDInsight on Azure: sign up for HDInsight
- Learn Hadoop: grab our own Hortonworks Sandbox
Java and the Spring XD Framework
Spring for Apache Hadoop (SHDP) a consistent configuration and API across a wide range of Hadoop ecosystem projects such as Pig, Hive, and Cascading in addition to providing extensions to Spring Batch for orchestrating Hadoop based workflows. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration.
SHDP, together with Spring Integration, Spring Batch and Spring Data are part of the Spring IO Platform as foundational libraries. Building on top of, and extending this foundation, the Spring IO platform provides a big data runtime named Spring XD (XD = eXtreme Data). Spring XD provides a single platform that addresses common use cases in big data solutions – without the need to write code – just by using a domain specific language (DSL). These use cases include data ingestion from external sources, data transformation and real-time analytics, data import/export to/from HDFS, and workflow orchestration.
These foundational parts of Spring IO platform make Hadoop development more accessible to a wider range of Java developers – including the massive Spring developer community – and make the process even faster for Hadoop experts.
- Using Spring XD with Hadoop and Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/using-spring-xd-to-stream-tweets-to-hadoop-for-sentiment-analysis/
- Spring IO Platform: https://spring.io/platform
- Spring XD Project: http://projects.spring.io/spring-xd/
- Spring for Apache Hadoop Project: http://projects.spring.io/spring-hadoop/
- How Spring XD works: https://github.com/spring-projects/spring-xd/wiki/Architecture
- Source code: https://github.com/spring-projects/spring-xd
Hortonworks has been supporting application developers since the beginning of the company. We work with the community to produce a standard set of APIs for developers to use to create their applications. Our most recent release includes both those development APIs and integration into developer tools to speed creation of new applications.