This 4 day training course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) environments. Specific technologies covered include: Apache Hadoop, Apache Kafka, Apache Storm & Trident, Apache Spark and Apache HBase as well as Apache NiFi and Solr. The highlight of the course is the custom workshop-styled labs that will allow participants to build complete streaming applications with Storm and Spark Streaming.
Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.
Developers and data engineers who need to understand and develop real-time / streaming applications on HDP.
Day 1: HDP Real-Time Architecture and Components
Real-time architecture & overview of the class
Identify the relevant HDP/HDF components
Creating Kafka topics from CLI and publishing & consuming messages from Java
Creating & accessing HBase tables from HBase shell and from Java
Day 2: Real-Time Processing with Storm
Building Storm topologies
Extending Storm with Trident
Integrating Kafka with Storm
Integrating Kafka with Storm
Interactive workshop: Consuming a Kafka topic with a Storm topology and publishing results to HBase
Day 3: Real-Time Processing with Spark Streaming
Spark ecosystem overview
Integrating with Kafka
Spark RDD WordCount
Spark Streaming WordCount
Interactive workshop: Consuming a Kafka topic with a Spark Streaming application and publishing results to HBase
Securing Azure HDInsight with Apache Ranger & Azure Active Directory
As Hadoop based workloads are becoming ever more popular in the enterprise, the need for enterprise grade capabilities like active directory based authentication, multi-user support, and role based access control has never been more important. In this session, we are going to explore how you can create an HDInsight cluster joined to an Active Directory…
How do you optimize Apache Spark workloads in the cloud? How do you tune your resources for maximum performance and efficiency? Find out how Hortonworks support subscriptions enable IT agility and success in the cloud. We will cover: Options for running Data Science, Analytics and ETL workloads in the cloud Hortonworks support offerings How to…
The San Jose DataWorks Summit (June 13-15) is nearly upon us! Our array of speakers is larger and even more impressive than last year. This year one of our Keynote and Enterprise Adoption Tracks will include Dr. Wade Schulz, Resident Physician, Clinical Pathology, at the Yale School of Medicine. Co-presenting will be Hao Dai, Deputy Director, Biobank…
Don’t miss the Business of Data at DataWorks Summit
Business and tech leaders know that harnessing the information they need to become a data-driven organization is no simple task: rapidly assimilating, integrating, and analyzing big data streams from internal and external sources to produce actionable insights can be daunting. The DataWorks Summit/Hadoop Summit community has put together more than 170 sessions on the business technology…
Did you every consider that an open source business model is like a talent show? Now I will admit, I am not a fan of television, and certainly not of reality TV. But that doesn’t mean I’ve been living under a rock. I know there are wide numbers of people who love watching talent contests…
Apache Metron Insight #1: Why real-time enrichment matters
Welcome to our blog series on Big Data Cybersecurity, where we will share key insights on the how and why Apache Metron is designed to address real-world issues of security operations personnel. Our first topic is about real-time enrichment. What does enrichment mean? To best understand what real-time enrichment is about, it is important to…
TMW Systems Drives Transportation Businesses Out of the Dark with Big Data
You wouldn’t drive in the dark without headlights and you wouldn’t want to operate a fleet of trucks without the necessary information to keep them on the road. The right data at the right time can help any company avoid disaster. The Road of Data Ahead As transportation enterprises understand, intuition and agility are not…
Hive / Druid integration means Druid is BI-ready from your tool of choice This is Part 3 of a Three-Part series of doing ultra fast OLAP Analytics with Apache Hive and Druid. Connect Tableau to Druid Previously we talked about how the Hive/Druid integration delivers screaming-fast analytics, but there is another, even more powerful benefit to…
CenterPoint Energy: Business Value from Large, Complex Data
The San Jose DataWorks Summit (June 13-15) is just a few weeks away! We’re busy finalizing the lineup of an impressive array of speakers and business use cases. This year our Data Processing & Warehouse Track will feature Daniel Sumners, IT Architect at CenterPoint Energy. CenterPoint Energy is a Fortune 500 electric and gas utility company operating in several…
Clearsense: Maximum Healthcare Transformation, Minimal Investment
Clearsense, based in Jacksonville, Florida, develops cloud-based applications based upon Hortonworks 100% open-source Connected Data Platforms. Its customers are hospitals and healthcare systems, and its mission is to save people's lives by giving providers and medical practitioners advanced notice of a patient’s deteriorating health. Clearsense achieves its mission through the open source power of Hortonworks…
Expressway Authorities do Hadoop Every day, Expressway Authorities must make critical decisions -- often times without sufficiently accurate and transparent data. At the same time, they may be losing revenue due to reporting latency and the inability to respond when toll plaza sensors are down. Hortonworks DataFlow (HDF™) and Hortonworks Data Platform (HDP®), can help resolve these…
What’s New for Apache Spark & Apache Zeppelin in HDP 2.6?
The value of any data is proportional to the insights derived from it. With the Data Lake Architecture, all of the enterprise data is made available in one place. The key to driving insights from the Data Lake is Apache Spark & Apache Zeppelin. Both are key tools to drive Predictive Analytics and Machine Learning.…
Simon Meredith, Chief Technology Officer - CSI, IBM Europe explains the significance of IBM & Hortonworks working together in the era of Big Data What is fuelling IBM’s commitment to Apache Hadoop and Spark? The pressures of day to day business are delaying companies doing more with their data. IBM’s commitment is to initiate, simplify…
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.