HDFS Forum

HDFS File Append

  • #30882


    I am in the process of developing a framework around Hadoop that enables RabbitMQ messages to be persisted in HDFS. The messages will continuously stream into the system, as they are stock prices or weather data etc. Unfortunately it looks like I will not be able to append to a file in HDFS version 1.x.x. as per:

    HADOOP-8230. Major improvement reported by eli2 and fixed by eli
    Enable sync by default and disable append
    Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you’re OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag “dfs.support.broken.append” to true.

    Link: http://hadoop.apache.org/docs/r1.1.1/releasenotes.html

    Could anyone please elaborate on this release note message? Why is it possible for HBase to append? Can I create a program somehow that is able to safely and robustly append to files?

    I am running HortonWorks Windows distribution on a cluster of 3 machines.

    Many thanks for your help in advance.

    Kind Regards,


to create new topics or reply. | New User Registration

  • Author
  • #30948
    Sasha J


    HBase does not use append. It only uses sync functionality.

    Append in 1.x release stream is not used by any of the Hadoop framework components. There have been issues observed related to stability. Given 2.x releases have newer implementation of Append and is deemed stable, no further effort is being done to stabilize the append feature in 1.x stream.

    Thank you!

The topic ‘HDFS File Append’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.