The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDFS Forum

About HDFS-1362

  • Author
  • #14293
    Seth Lyubich

    Hi Xiaobo,

    At this point HDFS-1362 is unresolved Jira with unsupported branch of Hadoop (0.23.0)


    xiaobo gu

    I know, but I’d like to know do you have a plan to make it production ready, we like it very much.

    Larry Liu

    Hi, Xiaobo,

    I saw you have posted a question in that bug as well:

    Xiaobogu added a comment – 13/Jan/13 02:31
    Hi,I would like to know about when will you merge this feature into the main version,thanks

    I think this is the best place to ask this question.



    xiaobo gu

    I’d like to know do your experts at hortonworks agree with that this should be a top priority task, and do you have a plan to make it avaiable in HDP.


    Hi Xiaobo,

    Yes, we agree that that feature is a great feature, looking at the apache jira (HDFS-1362) I see that it is targeted for inclusion in hadoop version 3.0.0 (which is not even in alpha state yet), and that version is still quite a ways off for inclusion in HDP.


    Steve Loughran

    I’m just going to comment here as a Hadoop committer. -if you look on hdfs-dev, the main work that’s been recently going on is better availability-at-scale (where a cold failover takes too long), and snapshotting -the latter is viewed as critical to protect data.

    Replacing disks -which is a need I first filed in HDFS-664, isn’t so critical as protecting data, it just makes it easier to handle disk failures by decommissioning one disk and swapping in a new one. It’s more important in smaller clusters, as the very large ones use “skinless” servers that usually lack hotswap, and in a big cluster taking a single server down for a disk swap is less noticeable.

    It does matter on small/mid-size clusters, as the outage of a server is more visible. Today you can just shut down the server, swap the disk and restart. The new disk will be empty, leading to an unbalanced set of disks -there’s been some JIRAs on that too.

    For hotswap to get into a stable, production ready it’ll need more review, to go into trunk, more tests, then testing in full-scale clusters with data than the cluster owner is prepared to remove, before finally gong in to beta test phase. It’s that protection of data which makes everyone working on HDFS tread cautiously -nobody wants to do anything that risks losing or corrupting data. Which is also why we like using stable Linux releases -RedHat/CentOS 6.x, stable Java versions and Linux filesystems -such as ext3 and ext4- with the “aggressive” features turned off. In a large cluster all corner-cases in the software stack will surface, which is why there’s so much testing needed.

    When will a version of HDP ship with it? Not for a while -it’s not even checked in yet. Look at how the Linux kernel source tree is way ahead of what RedHat ships, even though most of the kernel development is done by RedHat. You can be a bit more bleeding edge using Fedora, but you never do that on systems that you care about, as you end being the person to find the bugs.

The topic ‘About HDFS-1362’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.