cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 26, 2017 | Tom Hastain | Hortonworks Case Study

Precision Medicine: a 5 Million Person Case Study

May 26, 2017 | Carole Gum | Hortonworks Community Connection

Don’t miss the Business of Data at DataWorks Summit

May 26, 2017 | Anna Yong

Open Source Talent Powers Big Data Success

Viewing posts by: Vinay Shukla« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

The value of any data is proportional to the insights derived from it. With the Data Lake Architecture, all of the enterprise data is made available in one place. The key to driving insights from the Data Lake is Apache Spark & Apache Zeppelin. Both are key tools to drive Predictive Analytics and Machine Learning. […]

We are very excited about the release of Apache Zeppelin 0.7.0 and want to thank the Apache Foundation along with the Apache Zeppelin community. The long awaited release introduces several key features which are highlighted below, the most notable improvements in this release are in the area of multi user enhancements, pluggable visualization, Apache Spark & security […]

Apache Spark 2.1 was released recently in the community. The main focus of this release was improvements in Structured Streaming and Machine Learning. Structured Streaming: Kafka .10 support, Metrics & Stability improvements Machine Learning: SparkR Improvements including new ML algorithms for LDA, Random forests, GMM, etc. Wanna try Spark 2.1 now? Well, you are in […]

Apache Spark 2.0 was released yesterday in the community. This is a long awaited release that delivers several key features. We are really excited about this release and sincerely thank the Apache Software Foundation and Apache Spark communities for making this release possible. The most notable improvements in this release are in the areas of API, […]

The below blog has been co-authored by Vinay Shukla, Hortonworks, Moon So Lee, Apache Zeppelin PMC & NFLabs, Prabhjyot Singh, Apache Zeppelin PMC & Hortonworks” Recently the Apache Software Foundation (ASF) announced Apache Zeppelin as a top level project. This was a great milestone for both the Zeppelin and data science community. Since its’ incubation in […]

In March 2016 we announced Apache Spark 1.6 GA on HDP 2.4 and provided the 2nd technical preview of Apache Zeppelin. Since then, Apache Spark 1.6.1, a patch release with bug fixes, has been released by the open source community. Marching with the community, the upcoming maintenance release of HDP 2.4 will include Spark 1.6.1 […]

As Apache Spark continues to gain popularity, the rapid march of new Spark releases continues. With HDP 2.4, we are announcing the general availability of Spark 1.6, which is the latest Spark version from the community. With Spark proving an incredibly useful data access engine running on top of Hadoop, data scientists and business analysts […]

Apache Spark’s momentum continues to grow and throughout 2015 we saw customers across all industries get real value from using it with the Hortonworks Data Platform (HDP). Examples include: Insurance Optimize their claims reimbursements process by using Spark’s machine learning capabilities to process and analyze all claims. Healthcare Build a Patient Care System using Spark […]

Apache Spark has garnered a lot of developer attention and is often the top of agenda in my customer interactions. Since we announced support for Spark in HDP, we have seen broad customer adoption of our Spark offering. Our customers love Spark for the simplicity of its API, speed of development and the runtime performance. […]

Hortonworks is pleased to announce the general availability of Apache Spark in Hortonworks Data Platform (HDP)— now available on our downloads page. With HDP 2.2.4 Hortonworks now offers support for your developers and data scientists using Apache Spark 1.2.1. HDP’s YARN-based architecture enables multiple applications to share a common cluster and dataset while ensuring consistent […]

We recently hosted a Spark webinar as part of the YARN Ready series, aimed at a technical audience including developers of applications for Apache Hadoop and Apache Hadoop YARN. During the event, a number of good questions surfaced that we wanted to share with our broader audience in this blog. Take a look at the […]

Enterprise Apache Hadoop provides the fundamental data services required to deploy into existing architectures. These include security, governance and operations services, in addition to Hadoop’s original core capabilities for data management and data access. This post focuses on recent work completed in the open source community to enhance the Hadoop security component, with encryption and […]

Hortonworks’ strategy, since our inception, has been extremely consistent: enable a modern data architecture whereby users have the ability to store data in a single location and interact with it in multiple ways – using the right data processing engine at the right time.  At the core of that strategy is YARN, which as a […]

Introduction HDP 2.1 ships with Apache Knox 0.4.0. This release of Apache Knox supports WebHDFS, WebHCAT, Oozie, Hive, and HBase REST APIs. Hive is a popular component used for SQL access to Hadoop, and the Hive Server 2 with Thrift supports JDBC access over HTTP. The following steps show the configuration to enable a JDBC […]

LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP groups. The second is to explicitly configure Hadoop to use LDAP-based group mapping. Here is an overview of steps to configure […]