About HDFS-1362

to create new topics or reply. | New User Registration

This topic contains 6 replies, has 5 voices, and was last updated by  Steve Loughran 2 years, 2 months ago.

Viewing 6 replies - 1 through 6 (of 6 total)

The topic ‘About HDFS-1362’ is closed to new replies.

  • Author
  • #14464

    Steve Loughran

    I’m just going to comment here as a Hadoop committer. -if you look on hdfs-dev, the main work that’s been recently going on is better availability-at-scale (where a cold failover takes too long), and snapshotting -the latter is viewed as critical to protect data.

    Replacing disks -which is a need I first filed in HDFS-664, isn’t so critical as protecting data, it just makes it easier to handle disk failures by decommissioning one disk and swapping in a new one. It’s more important in smaller clusters, as the very large ones use “skinless” servers that usually lack hotswap, and in a big cluster taking a single server down for a disk swap is less noticeable.

    It does matter on small/mid-size clusters, as the outage of a server is more visible. Today you can just shut down the server, swap the disk and restart. The new disk will be empty, leading to an unbalanced set of disks -there’s been some JIRAs on that too.

    For hotswap to get into a stable, production ready it’ll need more review, to go into trunk, more tests, then testing in full-scale clusters with data than the cluster owner is prepared to remove, before finally gong in to beta test phase. It’s that protection of data which makes everyone working on HDFS tread cautiously -nobody wants to do anything that risks losing or corrupting data. Which is also why we like using stable Linux releases -RedHat/CentOS 6.x, stable Java versions and Linux filesystems -such as ext3 and ext4- with the “aggressive” features turned off. In a large cluster all corner-cases in the software stack will surface, which is why there’s so much testing needed.

    When will a version of HDP ship with it? Not for a while -it’s not even checked in yet. Look at how the Linux kernel source tree is way ahead of what RedHat ships, even though most of the kernel development is done by RedHat. You can be a bit more bleeding edge using Fedora, but you never do that on systems that you care about, as you end being the person to find the bugs.



    Hi Xiaobo,

    Yes, we agree that that feature is a great feature, looking at the apache jira (HDFS-1362) I see that it is targeted for inclusion in hadoop version 3.0.0 (which is not even in alpha state yet), and that version is still quite a ways off for inclusion in HDP.



    xiaobo gu

    I’d like to know do your experts at hortonworks agree with that this should be a top priority task, and do you have a plan to make it avaiable in HDP.


    Larry Liu

    Hi, Xiaobo,

    I saw you have posted a question in that bug as well:

    Xiaobogu added a comment – 13/Jan/13 02:31
    Hi,I would like to know about when will you merge this feature into the main version,thanks

    I think this is the best place to ask this question.




    xiaobo gu

    I know, but I’d like to know do you have a plan to make it production ready, we like it very much.


    Seth Lyubich

    Hi Xiaobo,

    At this point HDFS-1362 is unresolved Jira with unsupported branch of Hadoop (0.23.0)


Viewing 6 replies - 1 through 6 (of 6 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.