How to feed binary input files into Map/Reduce job?

to create new topics or reply. | New User Registration

This topic contains 1 reply, has 2 voices, and was last updated by  Seth Lyubich 2 years ago.

  • Creator
    Topic
  • #28372

    Teng Gao
    Member

    Dear expert,

    I want to migarate my legacy C/C++ code to Hadoop on windows. So I installed HDP 1.0.1 Developer Preview & HDInsight Developer Preview on my Win7. I wrote my C++ code as Map/Reduce job using streaming feature. Then mapper will take input record from stdin, each record per line. However, my input files are largely binary-format. How could I read binary-format files as input for mapper?

    I find a possible way: Write the mapper’s input file as text-format, each line representing a binary file’s path, such as “hdfs:///user/tengao/demojob/Chunk_005.dat”. Then in mapper’s C code, when I read a line, I will use this path to open a file on HDFS. However, in mapper’s C code, the only way to open a file on HDFS is through “libhdfs”, which is C API for HDFS. I found “hdfs.h” under “\hadoop-1.1.0-SNAPSHOT\src\c++\libhdfs”, but I can’t successfully compiled it in my project. Does current version of HDP support “libhdfs” feature?

    Looking forward for your kindly help~

Viewing 1 replies (of 1 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #28487

    Seth Lyubich
    Keymaster

    Hi Teng,

    Unfortunately, C libhdfs is not supported at this time.

    Thanks,
    Seth

    Collapse
Viewing 1 replies (of 1 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.