Tuesday night I had the opportunity to visit Hacker Lab, where the Sacramento Women in Data Science hosted Dr. Ian Brooks, one of our super-talented Solutions Engineers. SWDS has over 500 members and regularly hosts data science events to help grow the data science and analytics community for, but not exclusive to, women. Groups such as the Sacramento Women in Data Science (SWDS) are vital to help grow the field of data science, and it appears their work is taking off.
Ian is focused on our public sector customers. He is currently helping state and local governments leverage Big Data for machine learning and analytics around management of resources such as water, and innovation around traffic volume and other critical infrastructure. As our population has scaled and placed incredible strain on our natural resources and infrastructure, data science has become of paramount importance.
Presentations like Ian’s are important because they help grow the field of data science, but also show how accessible and easy data analytics can be with the right platform and tool set. The Apache Hadoop Ecosystem has projects to help any enterprise deploy, integrate, and work with massive amounts of structured and unstructured data. Hortonworks has more Apache Committers than anyone else to help grow this ecosystem. From local presentations to summits to Committers, we recognize the importance of community.
Attendees at the Hacker Lab were given access to a large data set they could manipulate with Hortonworks Sandbox, our free, downloadable single-node environment. The analytics use case was topical and engaging:
Unsupervised Machine Learning for Global Terrorism Data using Apache Spark
This presentation will demonstrate how to use Apache Spark for Data Science practices, which will be applied to the Global Terrorism Database (GTDB). It will include the required data preparation techniques (feature selection, cleaning, and transformation) before proceeding to clustering, anomaly detection, and model evaluation. The entire demonstration will be presented in an Apache Zeppelin notebook, and it will include a brief introduction Apache Spark and Apache Zeppelin. Those in attendance can participate using Hortonwork’s HDP sandbox, which is a free single node environment of Apache Hadoop.
Missed the presentation? No worries! SWDS live-streamed the event on Facebook, you can watch the recordings below:
For more SWDS events, check out the Meetup page. Their next event is July 25th: ML Technical Working Session – Breast Cancer Diagnosis Data. They’re doing great things, check them out!