Neville Li, from Spotify, will talk about their story of migrating their big data infrastructure to Google Cloud. Over the past year or so they moved away from maintaining their own 2500+ node Hadoop cluster to managed services in the cloud. We replaced two key components in their data processing stack, Hive and Scalding, with BigQuery and Scio and were able to iterate at a much faster speed. In this meetup, we will focus the technical aspect of Scio, a Scala API for Apache Beam and Google Cloud Dataflow and how it changed the way Spotify processes data.
About the Speaker
Neville Li works on data and machine learning infrastructure at Spotify and is the creator of Scio, a Scala API for Apache Beam. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.
6:30 p.m. Networking & Pizza
7:00 p.m. Talk – Big Data Processing at Spotify: The Road to Scio by Neville Li (Spotify)
7:45 p.m. More networking!