Apache Hadoop has come along a long way. From its early days as a platform to index the web, it has evolved to its current interactive, real-time, and batch processing capabilities spanning gigabytes to petabytes of content. A key stepping stone in this evolution has been Apache Hadoop YARN. YARN has enabled enterprises to onboard “fit for purpose” processing engines to its Hadoop Data Lake. This has opened the Data Lake to rapid and unbridled innovation by the ISV community and delivered differentiated insight to the enterprise.…
The Hortonworks Blog
Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.
In most of the keynotes and tracks three themes resonated:
Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.
What is Apache Tez and why is it useful?
As part of our YARN Ready program, we are hosting a series of technical webinars highlighting the technologies and resources available to developers for creating YARN applications. In our first webinar, “Introduction to YARN Ready,” we presented an overview of the YARN Ready program.
To extend your technical knowledge, please join us for our first in-depth YARN Ready technology webinar, “Integrating Applications Natively to YARN” on Thursday July 24 at 9am Pacific Time.…
Hadoop Summit Content Curation
Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.
Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:
- how rapidly they can create or replicate a cluster;
- how efficiently they can manage or monitor at scale;
- how easily and programmatically they can extend or customize their operational scripts; and
- how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.
Merv Adrian couldn’t have said it better. In his blog post from the weekend, he continued in his quest to define Hadoop. And it is no easy quest as the components of, and evolution of, Hadoop is happening at a pace that is, frankly, astounding.
Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.
Apache YARN Ready Program
With the release of Apache Hadoop YARN in October of last year, organizations are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform hosting multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop.
Customers are telling us loud and clear: they want solutions that run on YARN because it enables them to run multiple workloads on the same common data pool.…
We’re finally catching our breath after a phenomenal Hadoop Summit event last week in San Jose. Thank you to everyone that came to participate in the celebration of Hadoop advances and adoption—from many of the organizations that shared their Hadoop journey with us that fundamentally transformed their businesses, to those just getting started, to the huge ecosystem of vendors. It is amazing to be part of such a broad and deep community that is contributing to making the market for everyone.…
Apache YARN, Apache Slider, and Docker
Join us June 19 at 6 pm at the Hilton Fort Worth, Texas for an educational workshop hosted by Hortonworks and Sendero Business Services. The topic is “The Key To Success is Consistently Making Good Decisions & The Key To Good Decisions is Good Information.” The speaker is Don Hilborn, Solutions Engineer at Hortonworks.
Don will introduce the paradigm of
- Efficiency – double processing in Hadoop on the same hardware while providing predictable performance and quality of service; and
- Resource sharing – providing a stable common set of shared resources across multiple, coordinated workloads in Hadoop.
This is the second in the series of blogs exploring how to write data-driven applications in Java using the Cascading SDK. The series are:
Historically, programming languages and software frameworks have evolved in a singular direction, with a singular purpose: to achieve simplicity, hide complexity, improve developer productivity, and make coding easier. And in the process, foster innovation to the degree we have seen today—and benefited from.
Anyone among you is “young” enough to admit writing code in microcode and assembly language?…
With the release of Apache Hadoop YARN in October of last year, organizations are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform hosting multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop. Many enterprises have adopted YARN as the architectural center of a set of integrated technologies and capabilities that form the blueprint for enterprise Hadoop.
YARN Enabling the Ecosystem Technologies
Hortonworks is making it easier to develop YARN applications through a number of technologies. …
A significant reason for the increased adoption of the Hortonworks Data Platform by customers and partners has been Apache Hadoop YARN. This major advance, introduced last October in HDP2, allows Hadoop to move from many single-purpose clusters to a versatile, integrated data platform that hosts multiple business applications.
YARN has become the architectural center of Hadoop. We intend to make it easier for applications to work in a YARN environment, and benefit from the integrated capabilities and technologies that form the blueprint for enterprise Hadoop.…
More and more solution providers are integrating with Hortonworks Data Platform to provide their customers with enterprise Hadoop.
As part of our HDP 2.1 certification series, I would like to introduce Greg Benson, Chief Scientist at SnapLogic. In this blog, Greg provides some insights about the value of obtaining HDP 2.1 certification and the benefits of integration platform as a service (iPaaS).
SnapLogic provides a cloud-based service for performing a wide range of data and application integration tasks.…
We recently hosted the fourth of our seven Discover HDP 2.1 webinars, entitled Apache 2.4.0, HDFS and YARN. It was very well attended and a very informative discourse. The speakers outlined the new features in YARN and HDFS in HDP 2.1 including:
- HDFS Extended ACLs
- HTTPs support for WebHDFS and for the Hadoop web UIs
- HDFS Coordinated DataNode Caching
- YARN Resource Manager High Availability
- Application Monitoring through the YARN Timeline Server
- Capacity Scheduler Preemption
Many thanks to our presenters, Rohit Bakhshi (Hortonworks’ senior product manager), Vinod Kumar Vavilapalli (co-author of the YARN Book, PMC, Hadoop YARN Project Lead at Apache and Hortonworks), and Justin Sears (Hortonworks’ Product Marketing Manager).…