Hadoop Summit – Dublin taking place 13-14 April 2016 http://www.hadoopsummit.org/dublin
Unlike other conferences, Hadoop Summit is driven for the community by the community and this year’s speaker submissions have been open for public viewing http://hadoopsummit.uservoice.com/. The top vote getting sessions are automatically selected for the conference. The competition was strong, the content was amazing and with over 13,000 votes tallied, we are happy to announce that the results are in!
Before announcing the winners, we’d like to thank all of you who submitted abstracts, took part in tweeting, sharing, urging support and voting on sessions. This year’s event will be bigger and better than ever!
Also, now is the best time to register to see this content and much more. Super early bird registration kicks in with all an access pass at €699. So register here: http://www.hadoopsummit.org/dublin/register/
Without further ado, the Community Choice Winners for Hadoop Summit 2016 are…
Apache Committer Insights
Venkatesh Sellappa, Teradata UK, Solution Architect
I am a new contributor to Apache NiFI and this talk takes a light-hearted look at my journey of how to become a contributor to an Apache Project. It will outline the skills required, the steps to take for setting up a project, the correct etiquette in subscribing to a mailing list, the right way to ask a question, the way to engage with the community and the best practices for submitting a patch, documentation etc.
Applications of Hadoop and the Data-Driven Business
Romika Yadav, Research Scholar & Savita Kumari, Assistant Professor, Indira Gandhi University Meerpur Rewari India
Crime forecasting for the future is a process that find out the crime rate change from one year to the next years and project those changes in the future. Crime is an offence in the society and it has been observed that crime is committing by the criminals at any place, time and form. So need to predict those crime events, it can save the lives of persons. One of the well-known crimes in world is attack on September 11, 2001 on world trade Center. Crime not only effect on the individual one but it affect to the people of the whole country as well. In this regard the enforcement agencies and researchers need to performed and also having a responsibility to analyze the crime events from the voluminous crime data set. Crime analysis is the crucial trend for the police department about the prediction of crime, their associated information that includes types of crime, probable methods and location of crime. Proposing a crime prediction model for crime detection, crime visualization, crime prevention and crime prediction using big data techniques provide accurate visualization of data and perform computation fastly on MapReduce tool of Hadoop framework.
Data Science Applications for Hadoop
Bill Porto, RedPoint Global Inc., Senior Engineering Analyst
Applying machine learning to Big Data is something many strive for and few achieve – yet. Creating models to predict customer response or to segment customer data into set categories are “predictable” use cases. Taking data, discovering what it can tell you, and creating a model and use for it sounds simple enough. It’s a start, but not enough to impact sustainable revenue or cost advantage for your enterprise.
This session will cover the mission critical questions related to model choice, viability horizon, practical design alternatives, learning from on-the-fence model factors, and opportunities for automating access to changing data and netting-out error and noise…
Hadoop and the Internet of Things
Nikhil Joshi,Consultant Product Manager & Priya Lakshminarayanan, Director Product Management, EMC
Traditionally, HDFS provides robust protection against disk failures, node failures and rack failures. The mechanisms to protect data against entire datacenter failures and outages leave much to be desired. Neither the storage substrate (HDFS), nor the applications on top (MapReduce, Hive, HBase) are capable of running across geographies/data-centers. With Hadoop’s increased enterprise adoption, there is greater need to protect business critical datasets in Hadoop clusters. This is motivated in large part by compliance, regulation, data protection and business continuity planning…
Hadoop Application Development: Dev Languages, Scripting, SQL and NoSQL
Piotr Lusakowski, deepsense.io, Senior Software Engineer
In this talk we’ll take a deep dive into how an IPython notebook can be connected to a running Spark application and how it can be used for data exploration and debugging. IPython notebook is an established tool in the Data Science community and embedding the application’s Spark Context within it, can speed up development and limit errors. One possible usage model is to share a set of precomputed RDDs cached in Spark’s cluster memory between multiple users. This approach reduces the resource usage, since the precomputation happens only once and the cached data is not replicated for each user. Sharing SQL contexts allows instantaneous access to temporary results by other users…
Hadoop Governance, Security, Deployment and Operations
Zoltán Zvara & Marton Balassi, Hungarian Academy of Sciences Researcher, Developer
Understanding the physical plan of a big data application is often crucial for tracking down bottlenecks and faulty behavior. Apache Spark although offering useful Web UI component for monitoring and understanding the logical plan of the jobs, lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by RDDs and remote block-requests. We propose a tool that allows users to real-time monitor and later to replay, examine job executions on any cluster currently supported by Spark….
The Future of Apache Hadoop
Slim Baltagi, Capital One, Financial Corporation Director of Big Data Engineering & Fellow
This is an introductory level talk about Apache Flink: a multi-purpose Big Data analytics framework leading a movement towards the unification of batch and stream processing or stream processing-first in the open source. With the many technical innovations it brings along with its unique vision and philosophy, it is considered the 4 G (4th Generation) of Big Data Analytics frameworks providing the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases….
The winning sessions from the community vote (above) will be combined with a set of content that is being curated by a group of Hadoop experts and veterans by our content selection committees. Once the schedule has been created, we will post on the HadoopSummit.org website in early January! Hadoop Summit Partner sponsorship opportunities are now open for business, and registration is open for attendees.
This is the eighth Hadoop Summit we’ve hosted and the content this year is very strong. No matter where you are in your journey with Apache Hadoop, just learning and exploring or full production there will be sessions you can learn from and take away practical usable advice. You’ll also be able to interact with to core engineers and architects from across the various Apache projects that make up an enterprise grade Hadoop platform.
We hope to see you there!
Register for Super Early bird here