HDP on Windows – Installation Forum

How to feed binary input files into Map/Reduce job?

  • #28372
    Teng Gao

    Dear expert,

    I want to migarate my legacy C/C++ code to Hadoop on windows. So I installed HDP 1.0.1 Developer Preview & HDInsight Developer Preview on my Win7. I wrote my C++ code as Map/Reduce job using streaming feature. Then mapper will take input record from stdin, each record per line. However, my input files are largely binary-format. How could I read binary-format files as input for mapper?

    I find a possible way: Write the mapper’s input file as text-format, each line representing a binary file’s path, such as “hdfs:///user/tengao/demojob/Chunk_005.dat”. Then in mapper’s C code, when I read a line, I will use this path to open a file on HDFS. However, in mapper’s C code, the only way to open a file on HDFS is through “libhdfs”, which is C API for HDFS. I found “hdfs.h” under “\hadoop-1.1.0-SNAPSHOT\src\c++\libhdfs”, but I can’t successfully compiled it in my project. Does current version of HDP support “libhdfs” feature?

    Looking forward for your kindly help~

to create new topics or reply. | New User Registration

  • Author
  • #28487
    Seth Lyubich

    Hi Teng,

    Unfortunately, C libhdfs is not supported at this time.


You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.