We started Hortonworks Community Connection at the end of 2015, and there is some amazing content that any data developer or data administrator should read and bookmark. I will publish this blog weekly and highlight the top technical articles that are on HCC based on community activity and votes.
Top 3 articles on the site:
- Sample HDF/NiFi flow to Push Tweets into Solr/Banana, HDFS/Hive This article provides an overview on how to create a simple event processing flow. This guide starts with installing the software, walking trough all the necessary setup, and setting up the event flow. Must read for anyone interested in data ingestion and streaming.
- Unofficial Storm and Kafka Best Practices Guide Are you using Storm or Kafka for data processing. Then learn from the experts in the trenches on the best practices and implementation guidelines. This should be required reading for novices and experts wondering on best way to tune and monitor Kafka and Storm.
- Ambari Rolling & Express Upgrade Are you tired of the risk and monotony of doing upgrades. This article covers how to use Ambari and setting up the necessary automated steps and procedures to allow for express and rolling upgrades.
Top 3 questions last week:
- HDFS replication and impact on concurrency: If I have a 100gig data set and the same data set is hit concurrently is one of the options to increase the replication factor to support high concurrency hits? I have long understood this to be true but can’t specifically articulate clearly why? Any details would be appreciated.
- Hive metastore issue in HDp220.127.116.11: We have configured HDP 18.104.22.168 with Ambari in CentOS 6.4. Post installation we can see that the Hive Metastore service is getting stopped everytime it is started through Ambari. We had chosen MySQL for Hive metastore but in logs we can see it tries to connect with Derby. Looking for your help.
- amabari server 2.1.2 setup – Error while creating database accessor com.mysql.jdbc.Communications — then a log dump..