HDFS File Append

to create new topics or reply. | New User Registration

Tagged: 

This topic contains 1 reply, has 2 voices, and was last updated by  Sasha J 1 year, 7 months ago.

  • Creator
    Topic
  • #30882


    Member

    Hi,

    I am in the process of developing a framework around Hadoop that enables RabbitMQ messages to be persisted in HDFS. The messages will continuously stream into the system, as they are stock prices or weather data etc. Unfortunately it looks like I will not be able to append to a file in HDFS version 1.x.x. as per:

    HADOOP-8230. Major improvement reported by eli2 and fixed by eli
    Enable sync by default and disable append
    Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you’re OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag “dfs.support.broken.append” to true.

    Link: http://hadoop.apache.org/docs/r1.1.1/releasenotes.html

    Could anyone please elaborate on this release note message? Why is it possible for HBase to append? Can I create a program somehow that is able to safely and robustly append to files?

    I am running HortonWorks Windows distribution on a cluster of 3 machines.

    Many thanks for your help in advance.

    Kind Regards,

    Thomas

Viewing 1 replies (of 1 total)

The topic ‘HDFS File Append’ is closed to new replies.

  • Author
    Replies
  • #30948

    Sasha J
    Moderator

    Thomas,

    HBase does not use append. It only uses sync functionality.

    Append in 1.x release stream is not used by any of the Hadoop framework components. There have been issues observed related to stability. Given 2.x releases have newer implementation of Append and is deemed stable, no further effort is being done to stabilize the append feature in 1.x stream.

    Thank you!
    Sasha

    Collapse
Viewing 1 replies (of 1 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.