HDFS Forum

Small files – Hadoop 2.0 migrate from Hadoop 0.21

  • #46929
    Vitezslav Zak
    Participant

    Hi there,

    we will face the great task such as migration from hadoop 0.21 to hadoop 2.0. We have 7 servers (1x namenode, 6x datanode). We don’t dare to only upgrade hadoop 0.21 to 2.0. We want to migrate data gradually from one hadoop instance to another.

    Internally we have some java applications, which connect to Hadoop via Hadoop Java libraries 0.21. But if we want to connect to both hadoop instances we would use two different library versions in one project. But this is not acceptable.

    We considered about using webhdfs for second instance of hadoop, to avoid usage 2 different libraries. But we use hadoop archives, which cannot be accessible via webhdfs.

    Questions:
    1. Is there any possibility to write java application accessing to 2 different versions of hadoop?
    1.1. Could we connect to both hadoop instances with same java libraries (0.21)?
    2. Is there any other way to avoid problem with a lot of small files, than using hadoop archives? (we have something about 120 million images)

    Thanks for the reply

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #46936
    Robert Molina
    Moderator

    Hi Vitezslav,
    Are you planning on migrating to HDP 2.0 ?

    Regards,
    Robert

    #47013
    Vitezslav Zak
    Participant

    Yes, we plan to migrate from HDP 0.21 to HDP 2.0.

    #47153
    Pavel Hladik
    Participant

    Please, can somebody answer to important questions? And yes, we are going migrate 0.21 to 2.0.6 node per node with capacity of 300TB data.

    #47160
    Vitezslav Zak
    Participant

    Yes we plan to migrate from Hadoop 0.21 to Hadoop 2.0 through using your platform HDP 2.0.6.

    #47205
    Robert Molina
    Moderator

    Hi Vitezslav,
    1. You can maybe try installing both client files in separate locations, then before running your java application run have some script to specify HADOOP_HOME_DIR
    and HADOOP_CONF_DIR and point to the specific one you are running your java client against.

    1.1 Most likely not, this is not normally tested, since api’s change normally.

    2. other than HAR,You can also use open source file crusher utility for concatenating small files into default block sizes of 128 MB.

    Regards,
    Robert

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.