Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
January 15, 2014
prev slideNext slide

“Help, My Hadoop Doesn’t Work”

One aspect of community development of Apache Hadoop is the way that everyone working on Hadoop -full time, part time, vendors, users and even some researchers all collaborate together in the open. This developed is based on publicly accessible project tools: Apache Subversion for revision control, Apache Maven for the builds; Jenkins for automating those builds and tests. Central to a lot of work is the Apache JIRA server, an instance of Atlassian’s issue management tool.

If you are a Hadoop developer, you spend a lot of time with web browser tabs pointed at JIRA issues. As an example I’m keeping an eye on, YARN-896 and YARN-1489; new features being added to YARN to aid running long-lived applications in a Hadoop 2 cluster.

You also get to issues filed by others ending up in your inbox by way of subscriptions to the hadoop developer mailing lists: anyone has the right to create a JIRA account, file issue reports, and even supply patches to the source code.

Here’s a video I’ve made, and some slides, on how to do that – and in particular – how not to:

A theme I repeat in it is that JIRA is not a place to ask for help. If you have a support subscription with Hortonworks, you should report problems via our support portal as this lets us track the problem and – escalating as need be – any issue which does need a fix in Hadoop’s code will have a public JIRA filed against it, a patch developed in the open. There’s also our community forums to discuss HDP-specific issues.

Others will have a similar stance, even more so if their Big Data stacks include closed-source components such as filesystems, job schedulers or management tools. Issues in closed source components would – naturally – have to be taken up directly with the vendor.

The underlying Apache projects do welcome public filing of bug reports – provided they are about real bugs in the applications, and if they come with enough information to make it possible to identify root causes. They also welcome people supplying fixes to those bugs – patches containing source code, including tests. That’s a topic I plan to cover in another video.



jasbir singh says:

i am using HDP2.6.3 and 2.6.4 and writing the below code –

1. Creating sparkContext object
2. Reading a text file using – rdd =sc.textFile(“hdfs://”);
3. println(rdd.count);

After executing the 3rd line i am getting the below error –

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-32082187- file=/abc/test1.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(
at Source)
at org.apache.hadoop.util.LineReader.readDefaultLine(
at org.apache.hadoop.util.LineReader.readLine(
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.executor.Executor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$ Source)
at Source)

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums