The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDFS Forum

Small files – Hadoop 2.0 migrate from Hadoop 0.21

  • #46929
    Vitezslav Zak

    Hi there,

    we will face the great task such as migration from hadoop 0.21 to hadoop 2.0. We have 7 servers (1x namenode, 6x datanode). We don’t dare to only upgrade hadoop 0.21 to 2.0. We want to migrate data gradually from one hadoop instance to another.

    Internally we have some java applications, which connect to Hadoop via Hadoop Java libraries 0.21. But if we want to connect to both hadoop instances we would use two different library versions in one project. But this is not acceptable.

    We considered about using webhdfs for second instance of hadoop, to avoid usage 2 different libraries. But we use hadoop archives, which cannot be accessible via webhdfs.

    1. Is there any possibility to write java application accessing to 2 different versions of hadoop?
    1.1. Could we connect to both hadoop instances with same java libraries (0.21)?
    2. Is there any other way to avoid problem with a lot of small files, than using hadoop archives? (we have something about 120 million images)

    Thanks for the reply

  • Author
  • #46936
    Robert Molina

    Hi Vitezslav,
    Are you planning on migrating to HDP 2.0 ?


    Vitezslav Zak

    Yes, we plan to migrate from HDP 0.21 to HDP 2.0.

    Pavel Hladik

    Please, can somebody answer to important questions? And yes, we are going migrate 0.21 to 2.0.6 node per node with capacity of 300TB data.

    Vitezslav Zak

    Yes we plan to migrate from Hadoop 0.21 to Hadoop 2.0 through using your platform HDP 2.0.6.

    Robert Molina

    Hi Vitezslav,
    1. You can maybe try installing both client files in separate locations, then before running your java application run have some script to specify HADOOP_HOME_DIR
    and HADOOP_CONF_DIR and point to the specific one you are running your java client against.

    1.1 Most likely not, this is not normally tested, since api’s change normally.

    2. other than HAR,You can also use open source file crusher utility for concatenating small files into default block sizes of 128 MB.


The forum ‘HDFS’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.