I’m just going to comment here as a Hadoop committer. -if you look on hdfs-dev, the main work that’s been recently going on is better availability-at-scale (where a cold failover takes too long), and snapshotting -the latter is viewed as critical to protect data.
Replacing disks -which is a need I first filed in HDFS-664, isn’t so critical as protecting data, it just makes it easier to handle disk failures by decommissioning one disk and swapping in a new one. It’s more important in smaller clusters, as the very large ones use “skinless” servers that usually lack hotswap, and in a big cluster taking a single server down for a disk swap is less noticeable.
It does matter on small/mid-size clusters, as the outage of a server is more visible. Today you can just shut down the server, swap the disk and restart. The new disk will be empty, leading to an unbalanced set of disks -there’s been some JIRAs on that too.
For hotswap to get into a stable, production ready it’ll need more review, to go into trunk, more tests, then testing in full-scale clusters with data than the cluster owner is prepared to remove, before finally gong in to beta test phase. It’s that protection of data which makes everyone working on HDFS tread cautiously -nobody wants to do anything that risks losing or corrupting data. Which is also why we like using stable Linux releases -RedHat/CentOS 6.x, stable Java versions and Linux filesystems -such as ext3 and ext4- with the “aggressive” features turned off. In a large cluster all corner-cases in the software stack will surface, which is why there’s so much testing needed.
When will a version of HDP ship with it? Not for a while -it’s not even checked in yet. Look at how the Linux kernel source tree is way ahead of what RedHat ships, even though most of the kernel development is done by RedHat. You can be a bit more bleeding edge using Fedora, but you never do that on systems that you care about, as you end being the person to find the bugs.