Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
March 22, 2017
prev slideNext slide

Case Study: 2X Hadoop Performance with Hortonworks SmartSense Webinar Recap

Did you know every Hortonworks HDP support subscription comes with SmartSense?

Advanced Analytics of Diagnostic Data Prevents Issues

SmartSense uses advanced analytics to make suggestions and recommendations based on the deep knowledge of our Hortonworks engineers and committers to prevent issues and improve performance of your HDP cluster. Based on the diagnostic data collected from HDP clusters around the world, Hortonworks creates personalized recommendations for your specific cluster and workloads.

This significantly reduces the manual effort of reading all the documentation, gathering all your system data and becoming an expert on all the different permutations and combinations in order to tune your configuration for maximum performance. SmartSense alleviates the time and cost of this effort, and provides valuable recommendations on day one.

Three Recommendations Doubled Hadoop Performance

On the webinar on Feb 28th, we shared how three machine learning generated recommendations of Hortonworks SmartSense helped Pinsight tune their Hadoop deployment and double their Hadoop performance.

Specifically, SmartSense improved their clusters throughput, by optimizing through YARN configuration tuning. (To learn more about how it works, jump to the 18 minute mark in the webinar covering how YARN handles resource allocation, where Paul explains how YARN containers sizes are preconfigured to work conservatively out of the box, but with SmartSense you can tune your configuration in a more aggressive manner, and optimize your hardware for your specific workloads.) The gist of it, is that you move from a one size fits all, to a tailored approach.

Hortonworks SmartSense Demo and Case Study 2X Hadoop Performance

Reduce Mean Time To Blame

Beyond improved performance, Paul did want to let you know, especially for those Hadoop administrators and operators out there, SmartSense enables accelerated troubleshooting and proactive recommendations so keep your system running and reduce what we could affectionately term “Mean Time To Blame” (MTTB).

Q&A

During the webinar we had some great questions, captured below.

Usage & Licensing

  • Is SmartSense part of an HDP Enterprise support subscription or do we need to procure at additional cost?
  • How long does it takes to start producing recommendations specific for the cluster with the data collected after enabling SmartSense ?
    • Once SmartSense is enabled, and a support bundle is submitted, an automated recommendation is returned in ~10 minutes. Some exceptional cases may require further attention and may involve a manual review by the team, and may take several days.
  • Are there alternate ways to monitor CPU and usage details by queue apart from SmartSense?
    • Yes, there any many ways to access associated data – for example HDFS image analysis App Timeline Server API’s. However, it takes a lot of time and effort to extract it yourself. SmartSense with Zeppelin makes it significantly easier.
  • Does SmartSense help inform us of known bugs in the services?
    • In future versions of SmartSense we’re looking at having Known Issues for your components and security bulletins built into SmartSense, allowing customers to quickly understand if their issue is known and has a workaround, or if there are any critical security patches that need to be applied.
  • Is there a way to try SmartSense without registering in support?
    • Yes, please contact your local Hortonworks sales rep.
  • What if I cannot get a firewall opened to the gateway out to Hortonworks? Is it an option to save the output files locally, then transmit them manually to Hortonworks?
    • Yes, you can download the bundles manually and then we have multiple options for you to upload the bundles to us. See our documentation here: http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/bundle_transport.html
  • Can you apply SmartSense to part of a deployment – i.e. to one cluster before deploying on all?
    • Yes, SmartSense is deployed per cluster. Most customers try it on one non-production environment, and test some recommendations and then roll it out to other clusters. When you have a multi-cluster environment, it’s typically easier to deploy our SmartSense Gateway and just have everything sent through one point of egress. It makes it easier to work with your information security team. For more details see: http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/gateway_installation.html

Features

  • Can SmartSense recommendations be exported to be shared with a larger operations team?
    • Yes, this will be available in the next version of SmartSense, will have an excel export version of SmartSense currently expected in Q2 2017.
  • Are the chargeback reports available by default?
    • Yes, chargeback reports are available by default with SmartSense version 1.3.x.
  • Are there any plans to expand SmartSense to analyse usage of other components like Spark? And how soon?
    • Spark is In the plans for later this year.
  • Does SmartSense support Presto services ?
    • Presto is not part of platform and therefore we do not provide any data capture for it.
  • Does the chargeback analysis work if using doAs = false (So if using HIVE super2, then jobs submitted to YARN will be shown as HIVE )
    • No, for now, the HIVE user that runs the query is what’s counted, not the user that submitted the query.

How To:

  • How can we monitor the memory and CPU used by each queue ? Like memory, Tez , and break them down by HIVE, Pig, Oozie, etc
    • See webinar at 37 minutes to see how different jobs are monitored. MapReduce and Tez dashboards for examples, with jobs by type, jobs submitted etc.

  

 

  • For getting the YARN Dashboard do we need to install the Activity Analyzer on the Resource Manager or any slave node? Because for HDFS Dashboard we are installing it on the Namenodes.
    • For specific recommendations on placing the Activity Analyzer, see http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/activity_analyzer_placement.html
  • How to deploy the components to start use the Activity Explorer and Activity Dashboards release in version 1.3. (@49 minutes)
    • There are two new components in 1.3 – Activity Explorer and Activity Analyzer. The Activity explorers are essentially embedded Zeppelin instances. The activity analyzers do the actually work, and we have recomendation on where to deploy the activity analyzer.
    • If you have NameNode HA, you want the analyzer on each of those nodes. The instances installed on namenode will be focused on HDFS.
    • Then you want to have the analyzer on some master service; these will be mining informing from YARN
    • Typically you will have will have 3 – 1 on the namenode and then on some other master service.
    • Further details on where to deploy the analyzers is in the documentation – http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/activity_analyzer_placement.html

Integration with Zeppelin

  • Does Zeppelin support embedding of external http resources … i.e. if we have external systems doing some of our monitoring, can we have their images loaded within a Zeppelin dashboard (i.e. image tags or embedded html).
    • I believe this is the best way to accomplish what you’re asking for: https://zeppelin.apache.org/docs/0.7.0/displaysystem/basicdisplaysystem.html#html
  • Can we integrate SmartSense Zeppelin Dashboard with LDAP/AD?
    • Yes, you can use the Zeppelin documentation for that: https://zeppelin.apache.org/docs/0.6.2/security/shiroauthentication.html#ldap

Leave a Reply

Your email address will not be published. Required fields are marked *