Apache Hadoop has always been very fussy about Java versions. It’s a big application running across tens of thousands of processes across thousands of machines in a single datacenter. This makes it almost inevitable that any race conditions and deadlock bugs in the code will eventually surface – be it in the Java JVM and libraries, in Hadoop itself, or in one of the libraries on which it depends.
Hence the phrase “there are no corner cases in a datacenter”. It may be amusing, but it makes a point: over time what bugs there are the software stack of a datacenter will eventually surface.
Hadoop, the applications on top, and their dependency libraries are the core of what we qualify when our QA team does a release of the official Apache Hadoop binaries -as it has done on the core Hadoop projects for every production-quality release of Hadoop. It is also the core of what we test when making an HDP release -qualifying the stack on top of those Apache releases.
Testing the JVM is an implicit part of this -which is why we always state which Java versions we have tested on and support. Usually these supported versions are behind the latest Sun/Oracle releases. For a long time Hadoop was only recommended “in production” on on specific versions of Oracle Java 1.6 . Indeed, HDP-1 is still only supported on these. Nowadays getting a supported Java 1.6 version is hard as its hidden away in the Java Archive Download pages. When you do download the JDK, the installation process involves click through licenses making automating deployment and maintenance that much harder.
Which is why for HDP-2 we are pleased to announce that not only is it tested and supported on the
Oracle 1.7.0_21 JDK alongside the
1.6.0_31 version, we’ve also qualified it against
As a result, HDP-2 offers a new way to install a supported JDK:
yum -y install openjdk-7
Now, you can install the openjdk JDK and have yum keep it up to date. That is not just for developer and proof-of-concept systems, that is production clusters of hundreds to thousands of nodes which is the same scale at which we test HDP releases.
Not only does this simplify deployment and other operations tasks , it also starts to pave the way for closer links between the OpenJDK team and the Hadoop developer community. The functionality and performance of the JVM is critical to Hadoop – and if we can get better insight into how the open JVMs work, if we can get the OpenJDK team to have Hadoop on their list of key applications to care about, we can become more confident that future openjdk releases will work even better with Hadoop.
Of course, this is all in the future. But maybe we can view that
yum -y install openjdk-7 as the beginning.