Data-in-motion and the increase in the number of sensors and connected devices is fueling data growth in Hadoop. The speed with which enterprises can make decisions based on data is critical to their competitive advantage. At Hadoop Summit San Jose the IoT and Streaming Track had the most submissions or any track and the Track committee lead by Gearóid O’Brien – Distinguished Engineer, Neustar has come up with a powerful set of sessions that will cover:
You need to register to attend Hadoop Summit San Jose and if you do, here are my top 3 picks for this track:
Speakers: Kanishk Mahajan and Ryan Medlin from Hortonworks
Building high scale analytics solutions that can help derive value from large volumes of streaming data are driving transformative outcomes for enterprises today. Whether it is optimizing advertising revenue spend or making sense of data generated from a swarm of IoT devices this technology must be able to continuously capture and process terabytes of data per hour from the IoT Edge or across areas such as network events, website clickstreams, financial transactions, social media feeds, IT logs, location-tracking events etc. In response Big Data real time streaming technologies have evolved to manage high volume streams that allow building of high throughput, low-latency fault-tolerant distributed systems with features such as multi-stage processing using specialized algorithms, complex event processing, custom stream partitioning for finer control over scaling and durable temporary storage for data in transit. This talk focuses on how delivering real time actionable insight to businesses is shaping the future of Big Data real streaming technologies.
Speaker: P. Taylor Goetz from Hortonworks
The Internet of Anything is here. Gartner predicts there will be 26 billion devices on the Internet of Things by 2020. Capturing and analyzing data from connected devices provides a wealth of opportunity, but the road from device to data center to actionable insights can be frought with pitfalls. In this session we will discuss architectural considerations and common pitfalls across the full spectrum of IoT components: from hardware devices and sensors, to software in the data center. We will look at how open source Apache projects like Apache NiFi, Kafka, Storm, and Hadoop can work in concert to shepherd data along the path to insights in a large scale IoT architecture. Additionally, we will weigh the pros and cons of frequently used device-to-device and device-to-network communication protocols and data formats, and how to best leverage them in an IoT deployment. Finally, we will explore the security, privacy, and regulatory needs of IoT solutions.
Speakers: Joseph Niemiec and Christopher Gambino from Hortonworks
Join us as we discuss what life is like when your front door knows you are going to be late for work, the wifi router tells you guests have arrived, and Orwell’s dream is alive and well. Everything connects to the internet, everything generates logs, and everything has a different interface. How do we leverage multiple disparate sources of information to get a more complete picture of our lives? Modern tools allow us to quickly integrate event logs from different sources to feed analytic platforms. We will go through the challenges and surprising patterns that emerged the perspective of our devices. Apache Nifi enables disparate sources of information(sensor, wifi log, Rasberry Pi) to be easily acquired, analyzed in stream, and sent offsite. The integrated platform management that Nifi provides turns this task from a multi-application ecosystem to a single interface that serves as a nexus of incoming data. Utilizing Site-2-Site capabilities we are able to transfer data from the edge to the cluster for exploration in Spark. This allows us to focus purely on the data and not the technicalities normally present in complex event processing.