Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Jupyter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production?
Why Data Science on Big Data?
In this meetup you will cover the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of Apache Zeppelin, Apache Spark, Apache Livy and Apache Hadoop with the focus on integration, security and model deployment and management.
IBM Data Science Experience (DSX) provides you with the environment and tools to solve your business problems by collaboratively analyzing data. DSX is now part of the IBM Watson Data Platform which is an integrated platform of tools, services, and data that helps companies accelerate their shift to become data driven organizations. The platform can consume and work on any data source: on-premises or cloud, structured or unstructured, in-motion or at rest, internal or external to an organization. It has five main data and analytics functions (ingest, persist, analyze, deploy, and govern) that can be exercised in any order and any combination. The compose-able services provide APIs that enable customers, business partners, and system integrators to build on and extend these capabilities.With IBM Data Science Experience, you can create Python, Scala, and R notebooks to analyze your data, collaborate with others on your notebooks, add comments, and view a history of your notebooks. You can create a machine learning flow, which is a graphical representation of data, by using the Flow Editor to prepare or shape data, train or deploy a model, or transform data and export it back to a database table or file in object storage and Use visualization’s in your notebooks to present data visually to help identify patterns, gain insights, and make decisions. Many of your favorite open source visualization libraries, such as matplotlib, are pre-installed and available on DSX. Data Science Experience machine learning flows use Apache Spark machine learning (SparkML) algorithms to build models. RStudio, included in IBM Data Science Experience, provides an IDE for working with R. DSX has a thriving community which contains resources to help you learn more about data science. Join us on to see an use case driven demo of DSX and how it can help you get started on and/or accelerate your data science journey.
6pm: Pizza and networking
6:30pm: Why Data Science on Big Data? Data Science at Scale Demo
Artem Ervits, Sr. Solutions Engineer, Hortonworks
Rahul Daterao is an Enterprise Cloud Architect at IBM and a Watson Data Platform Cloud Specialist with expertise in Big Data, Elastic Search and other NoSQL technologies. His 20+ years experience includes being hands-on CTO for two NYC based startups (Appssavvy and KickApps). He is an open source technology evangelist, engineer, developer and enjoys to collaborate with all those passionate and interested in driving innovation.