cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 26, 2017 | Tom Hastain | Hortonworks Case Study

Precision Medicine: a 5 Million Person Case Study

May 26, 2017 | Carole Gum | Hortonworks Community Connection

Don’t miss the Business of Data at DataWorks Summit

May 26, 2017 | Anna Yong

Open Source Talent Powers Big Data Success

Viewing posts: From the Dev Team« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

  Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is. In part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In part 3 of the series, […]

As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles implementing their big data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study. In […]

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, […]

Large-scale Machine Learning The ability to learn without being explicitly programmed, Machine Learning, has been around for a long time and is well understood. What is different is the relatively recent emergence of general purpose tools, such as Apache Spark, that enable processing of very large datasets. Additionally, data scientists can now collaborate and rapidly […]

The 2014 Yahoo email hack is a good illustration how a big data security analytics platform such as Apache Metron can make it easier to detect, investigate, assess, and remediate threats in your environment.  In this article I will describe how to setup and configure Apache Metron to detect a recent cyber attack on Yahoo, […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we briefly introduced the power of leveraging prepackaged applications in Data Lake 3.0 and how the focus will shift from the platform management to solving the business problems. In this post, we further deliberate on this idea to help answer […]

The new year brings new innovation and collaborative efforts. Various teams from the Apache community have been working hard for the last eighteen months to bring the EZ button to Apache Hadoop technology and Data Lake. In the coming months, we will publish a series of blogs introducing our Data Lake 3.0 architecture and highlighting […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: metrics part 1, we started looking at metrics and why they are the foundation of a successful security program. Today, we’ll look at how we derive metrics that communicate value in a way that’s tied to the company strategy. Hopefully […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: Why an integrated approach matters, I stirred up the waters a bit by stating that the CISO’s first and most fundamental job is taking all this security data, threats, vulnerabilities, policy violations, and transforming it into business language that shows […]

We are very excited about the release of Apache Zeppelin 0.7.0 and want to thank the Apache Foundation along with the Apache Zeppelin community. The long awaited release introduces several key features which are highlighted below, the most notable improvements in this release are in the area of multi user enhancements, pluggable visualization, Apache Spark & security […]

Apache Spark 2.1 was released recently in the community. The main focus of this release was improvements in Structured Streaming and Machine Learning. Structured Streaming: Kafka .10 support, Metrics & Stability improvements Machine Learning: SparkR Improvements including new ML algorithms for LDA, Random forests, GMM, etc. Wanna try Spark 2.1 now? Well, you are in […]

We recently concluded our highly attended How to Get Started with Hortonworks Data Cloud for AWS Webinars. Thank you Jeff Sposetti and Sean Roberts for hosting the sessions. The webinars provided a very informative overview about the offering and included a detailed demonstration to show how the product works. Some great questions came across during […]

Originally posted in HCC 1. Introduction NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise […]

Apache Spark has been Open Source’s new kid on the block. Companies are using Spark to develop sophisticated models that would enable them to discover new opportunities or avoid risk. But what does the future or at least the near future hold for Spark? In this blog we have outlined five trends we see in […]