Last year on December 11th, Hortonworks presented the sixth of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Carter Shanklin and Enis Soztutar hosted this 6th webinar in the series.
After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture (MDA), Carter Shanklin and Enis Soztutar introduced Apache HBase and discussed how to use it with Apace Hadoop YARN and Apache Slider for fast NoSQL access to your data. They also covered Apache HBase innovations now included in HDP 2.2:
Here is the complete recording of the Webinar.
Here are the presentation slides.
And register for all remaining webinars in the series.
We’re grateful to the many participants who joined the HDP 2.2 webinar and asked excellent questions. This is the list of questions with their corresponding answers:
|Do you recommend HBase over Slider for production use?||HBase on Slider has been tested and certified for production use using the same HBase test suite for HDP.|
|Which server does one upgrade first, the master or region server? And is there a dependency?||HDP-2.2 supports rolling upgrade between maintenance releases (between 2.2.x). There is no dependency to the order of an upgrade, but we recommend upgrading the master first. For more information, consult the maintenance release documentation (2.2.x) on the rolling upgrade steps.|
|How can I upgrade my current HDP 2.1 to HDP 2.2 to support rolling upgrades if it doesn’t currently use the hdp-select directory structure?||HDP supports rolling upgrade starting with the HDP 2.2 release. You can do an in-place upgrade as documented here|
|When will HBase and Storm on Slider be integrated into Ambari?||Ambari Slider integration is in the works, and scheduled for the next release.|
|What is the benefit to have YARN on HBase nodes when HBase deployed in old way (no slider)?||Having YARN NodeManagers installed on the same nodes that hosts HBase RegionServers gives you the following advantages:
However, one should be careful in this setup not to impact HBase’s performance for low latency with strict SLA environments.
|Are there any plans to implement JVM block cache and/or memstore?||I am not sure what is meant by JVM block cache. HBase already has an on-heap or off-heap based block cache and an on-heap memstore. See https://hortonworks.com/blog/hbase-blockcache-101/ for more details.|
|How does HBase remove staled data problem in HDP 2.2?||I am not sure what is meant by removing stale data problem. An extensive overview of the “High Available Reads Using Timeline Consistent Region Replicas” can be found here.|
|Can Apache Spark run on top of HBase?||Apache Spark does not run on top of HBase. However, there are Spark RDD implementations available for reading from and writing to HBase.|
|How do multiple HBase instances in slider communicate with zookeeper?||In Slider deployments, every HBase instance shares the same Zookeeper quorum; however, the zookeeper root directory (and also HDFS root directory) is different, enabling more than one HBase instance in the same YARN cluster. The clients use the configuration (hbase-site.xml etc.) obtained from the Slider registry to connect to the correct cluster instance. This document contains the details for Slider deployments, and how to obtain client configurations.|
|Are the any extra hardware requirements for HBase HA configuration?||No, this feature will run without any changes in hardware requirements or deployment model. Only some configurations should be enabled as documented here.|
|How do I configure HBase to be more consistent than highly available?||HBase by default is strongly consistent. The improved work for high available reads using region replicas does not change the default model, unless a table is configured with region replication and reads are done with the TIMELINE consistency semantic. Otherwise, a region will always be served from a single RegionServer, and all reads and writes will always have STRONG consistency.
See these documents for an overview:
|Are there any new updates reads for HA Master?||HBase’s Master daemon already supports HA deployment which is the recommended way to run HBase. For the new feature called “High Available Reads Using Timeline Consistent Region Replicas,” there have been some changes in the master code to support regions replica concept.|
|Do the standby keys get promoted to primary keys if the primary node goes down?||Secondary region replicas do not get promoted to be primaries at this phase. This feature will enable faster recovery in case of a region server crash, but not implemented yet. We plan to evaluate whether the gains for this will justify the extra complexity for the distributed region recovery implementation.|
|Can I live upgrade master to HDP 2.2.1 without a restart?||Short answer is no. You should upgrade your master first, restart and then do upgrades.|
|How HBase upgrades are affected by presence of Phoenix? Should Phoenix be removed to support rolling upgrades?||Phoenix also supports rolling upgrades. The version of Phoenix installed is always tied with the version of the HBase installed. For example, Phoenix 2.2.1 will work with HBase 2.2.1. In a side-by-side installment, multiple versions of Phoenix and HBase might be installed together. Whenever the new region server is restarted, it will pick up the new Phoenix jars.|
Visit these pages to learn more: