As enterprises increasingly adopt Apache Hadoop for critical data, the need for high-quality releases of Apache Hadoop becomes even more crucial. Storage systems in particular require robustness and data integrity since enterprises cannot tolerate data corruption or loss. Further, Apache Hadoop offers an execution engine for customer applications that comes with its own challenges. Apache Hadoop handles failures of disks, storage nodes, compute nodes, network and applications. The distributed nature, scale and rich feature set makes testing Apache Hadoop non-trivial.
Testing Apache Hadoop does not just involve writing a test plan based upon the design spec. Instead, it requires an understanding of the numerous use cases of an API rather than the actual specifications of that API. The intriguing part has always been in analyzing the impact that every feature, improvement and bug fix has on the various Hadoop subsystems and user applications. Additionally, one has to go beyond the unit tests, functional, scale and reliability tests and run tests against live data and live user applications to test the integration of Hadoop with the other products in the ecosystem.