Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button

Apache Spark (SQL) Fine Grained Security with Apache Ranger and SparkR updates

Apache Spark(SQL) Fine Grained Security with Apache Ranger and SparkR update

Level: all levels


5:30 PM – 6:00 PM Food, drinks, mingling

6:00 PM – 6:15 PM Artem Ervits Announcements, call for presenters, future events

6:15 PM – 8:30 PM Vinay Shukla and Yanbo Liang, Hortonworks, Inc.

Fine Grained Security to SparkSQL

So far SparkSQL only provides coarse grained security. With coarse grained security users & groups access is either allowed or denied to a table. Often a finer control over security is needed. SparkSQL & Ranger integration allows controlling access to SparkSQL down to a row or column and other advanced controls such as masking. This session walks through this integration and shows a demo of the feature.

Integrate SparkR with existing R packages to accelerate data science workflows

R is the de-facto programming language for data science with nearly 10,000 packages in single-machine era. However, native R is burdened by numerous scalability challenges as the dataset increasing. SparkR provided many scalable statistic functions and distributed machine learning algorithms which can help users overcome the scaling bottlenecks. Could we integrate the better scalability of SparkR and function diversity of existing R packages? The answer is yes. In this talk, we will summarize the efforts related to integrate SparkR with existing R packages such as: user-defined function, apply function parallel, virtual environment for third-party R library, performance improvement of Spark DataFrame and local R DataFrame conversion, etc. Then we will demonstrate how to solve several typical data science tasks leverages these features. At last, we will shortly introduce the community efforts in progress on SparkR in the coming releases.


Vinay Shukla is the director of product management for Spark, Zeppelin, and Agile analytics at Hortonworks. Previously, Vinay worked as a developer and security architect. Vinay has been a frequent speaker at many conferences, including Hadoop Summit, Apache Big Data, JavaOne, and Oracle World. Vinay enjoys being on a yoga mat or on a hiking trail. You can follow him on his blog

Yanbo Liang is an Apache Spark Committer working on MLlib and SparkR at Hortonworks. His main interests center around machine learning, data science and distributed system. He is an active Apache Spark contributor(top 15), delivered the implementation of some major MLlib algorithms. Prior to Hortonworks, he was a software engineer at Yahoo! and France Telecom working on personalized recommendation and machine learning.

Monday, February 6, 2017
11 Times Square, (Microsoft Entrance on 8th Ave. between 41st and 42nd Streets), New York, NY