Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
August 16, 2017 | Kevin Jordan

How Do Insurance Companies Find Data in a Rain Drop?

August 15, 2017 | Roni Fontaine | Hadoop Ecosystem, Hadoop Insights

Get Started Quickly with Data Science in the Cloud ASAP!

August 15, 2017 | Carter Shanklin

Update Hive Tables the Easy Way

Viewing posts: From the Dev Team« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

This blog has contributions from Mingliang Liu and Rajesh Balamohan. Late last year, we provided a brief history of Apache Hadoop support for Amazon S3. Our first focus of work was speeding up the read of S3-hosted data acting as a query input. That was followed by the write pipeline, as well as scaling and […]

This blog has contributions from: Vinod Vavilapalli, Wangda Tan, Gour  Saha, Priyanka Nagwekar, Sunil Govindan You have probably wondered what makes a self-driving car intelligent to process the live camera feeds, navigate the busy streets and distinguish objects on the streets, such as cars, trucks, traffic lights or pedestrians? A self-driving car is a perfect […]

This blog was co-authored by: George Vetticaden, Sriharsha Chintalapani, Jungtaek Lim, Sanket Shah Last week, in Part 3 of this blog series, we announced the GA of HDF 3.0 and let the cat out of the bag by introducing the new open source component called  Streaming Analytics Manager (SAM), an exciting new technology that helps developers, […]

  Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is. In part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In part 3 of the series, […]

As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles implementing their big data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study. In […]

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, […]

Large-scale Machine Learning The ability to learn without being explicitly programmed, Machine Learning, has been around for a long time and is well understood. What is different is the relatively recent emergence of general purpose tools, such as Apache Spark, that enable processing of very large datasets. Additionally, data scientists can now collaborate and rapidly […]

The 2014 Yahoo email hack is a good illustration how a big data security analytics platform such as Apache Metron can make it easier to detect, investigate, assess, and remediate threats in your environment.  In this article I will describe how to setup and configure Apache Metron to detect a recent cyber attack on Yahoo, […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we briefly introduced the power of leveraging prepackaged applications in Data Lake 3.0 and how the focus will shift from the platform management to solving the business problems. In this post, we further deliberate on this idea to help answer […]

The new year brings new innovation and collaborative efforts. Various teams from the Apache community have been working hard for the last eighteen months to bring the EZ button to Apache Hadoop technology and Data Lake. In the coming months, we will publish a series of blogs introducing our Data Lake 3.0 architecture and highlighting […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: metrics part 1, we started looking at metrics and why they are the foundation of a successful security program. Today, we’ll look at how we derive metrics that communicate value in a way that’s tied to the company strategy. Hopefully […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: Why an integrated approach matters, I stirred up the waters a bit by stating that the CISO’s first and most fundamental job is taking all this security data, threats, vulnerabilities, policy violations, and transforming it into business language that shows […]

We are very excited about the release of Apache Zeppelin 0.7.0 and want to thank the Apache Foundation along with the Apache Zeppelin community. The long awaited release introduces several key features which are highlighted below, the most notable improvements in this release are in the area of multi user enhancements, pluggable visualization, Apache Spark & security […]

Apache Spark 2.1 was released recently in the community. The main focus of this release was improvements in Structured Streaming and Machine Learning. Structured Streaming: Kafka .10 support, Metrics & Stability improvements Machine Learning: SparkR Improvements including new ML algorithms for LDA, Random forests, GMM, etc. Wanna try Spark 2.1 now? Well, you are in […]