Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
December 30, 2014
prev slideNext slide

Top Ten Popular Hadoop Blog Posts of 2014

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular:

  1. Improving Spark for Data Pipelines with Native YARN Integration.

    Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable execution context.

  2. HDP 2.2 A Major Step Forward for Enterprise Hadoop

    Tim Hall outlines six months of innovation and new features across Apache Hadoop and its related projects.

  3. Evolving Apache Hadoop YARN to Provide Resource and Workload Management for Services

    Arun Murthy explains YARN’s extended capabilities for resource and workload management for long-running services.

  4. Data Science with Apache Hadoop: Predicting Airline Delays Series: Part I and Part II

    Ofer Mendelevitch and Beau Plath illustrate how to build predictive models using Apache Hadoop and Data Science’s Machine Learning Algorithms.

  5. Docker & Kubernetes on Apache Hadoop YARN

    Using Apache Hadoop YARN’s extensible capabilities and multiple workloads resource management, Sidharta Seethana explains how to enable PaaS.

  6. HBase and Hive—Better Together

    Devaraj Das et al., discuss an integrated architecture for closed-loop operational and analytical processing.

  7. Discardable Memory and Materialized Queries (DMMQ) in a Hadoop Cluster

    To put your memory into its right place in the storage hierarchy for efficient queries, Julian Hyde proposes a solution for a new kind of data set: Discardable, In-Memory, Materialized Query (DIMMQ).

  8. Heterogeneous Storages in HDFS.

    For heterogeneous storage support in HDFS, Arpit Agrawal explores scenarios that aim to achieve this capability.

  9. Benchmarking Apache Hive 13 for Enterprise

    Carter Shanklin shares the initiative that delivers batch and interactive SQL query workloads in a single engine.

  10. How to Think about Partnerships in the Enterprise Ecosystem

    What it takes to build a thriving Enterprise ecosystem with your partners and why key initiatives—partner, certify, engineer, and resell—are crucial for the ecosystem’s success, explains John Kreisa

Happy New Year!



  • Items 1 & 2 are the principal reasons why I recommend HDP to my clients, remarkable that you managed to include such a mature stack before the year was out. Excellent work, a fine end to 2014.

  • I have file.txt that contain the number from 1 to 10000 i want to add that no. for this like word count program i have to write three program that as follow
    driver program
    mapper program
    reducer program
    so my first doubt is that what is my (key, value) input and (key , value )output pair for mapper or reducer.can i explicitly mention the no of inputsplit or not? ………………….thanking you.

  • In HDP 2.2 A Major Step Forward for Enterprise Hadoop the highlights are very informative and useful.

    In Benchmark configuration the description of software and hardware are mentioned is very good to understand.

  • This is my first time to write a comment,this blog gives such a valuable information .In future hadoop and big data gives more job opportunities in top most industries.this blog gives me to strong knowledge about hadoop concept.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>