Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 16, 2017 | Matt Spillar | Hortonworks Case Study

Leveraging Data to Make Decisions in Financial Services

October 16, 2017 | Guest Author | Hadoop Insights

APM with Unravel and Hortonworks to Ensure Mission Critical, Fast and Error Free Performance

October 13, 2017 | Kevin Jordan | Hortonworks Case Study

Why The Big Data Landscape Is All Shades of Grey

Viewing posts: From the Dev Team« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

At Hortonworks we are constantly striving to achieve high quality releases. HDP/HDF releases are deployed by thousands of enterprises and are used in business critical environments to crunch several petabytes of data every single day. So maintaining the highest standards of quality and investing in an infrastructure to support the repeatable standards of quality is […]

This is the second post in the Engineering @ Hortonworks blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms. In this post, we deep dive into something that we are extremely excited about – Running a container cloud on YARN! We have been using this next-generation […]

One of the most exciting new features of HDP 2.6 from Hortonworks was the general availability of Apache Hive with LLAP. If you missed DataWorks Summit you’ll want to look at some of the great LLAP experiences our users shared, including Geisinger who found that Hive LLAP outperforms their traditional EDW for most of their […]

Our customers increasingly leverage Data Science, and Machine Learning to solve complex predictive analytics problem. A few examples of these problems are churn prediction, predictive maintenance, image classification, and entity matching. While everyone wants to predict the future, truly leveraging Data Science for Predictive Analytics remains the domain of a select few. To expand the […]

This is the introductory post in a blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms. In this post, we introduce the basic themes and set context for deeper discussions in subsequent blogs. We at Hortonworks are very proud of the work we do. Along with […]

This blog has contributions from Mingliang Liu and Rajesh Balamohan. Late last year, we provided a brief history of Apache Hadoop support for Amazon S3. Our first focus of work was speeding up the read of S3-hosted data acting as a query input. That was followed by the write pipeline, as well as scaling and […]

This blog has contributions from: Vinod Vavilapalli, Wangda Tan, Gour  Saha, Priyanka Nagwekar, Sunil Govindan You have probably wondered what makes a self-driving car intelligent to process the live camera feeds, navigate the busy streets and distinguish objects on the streets, such as cars, trucks, traffic lights or pedestrians? A self-driving car is a perfect […]

This blog was co-authored by: George Vetticaden, Sriharsha Chintalapani, Jungtaek Lim, Sanket Shah Last week, in Part 3 of this blog series, we announced the GA of HDF 3.0 and let the cat out of the bag by introducing the new open source component called  Streaming Analytics Manager (SAM), an exciting new technology that helps developers, […]

  Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is. In part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In part 3 of the series, […]

As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles implementing their big data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study. In […]

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, […]

Large-scale Machine Learning The ability to learn without being explicitly programmed, Machine Learning, has been around for a long time and is well understood. What is different is the relatively recent emergence of general purpose tools, such as Apache Spark, that enable processing of very large datasets. Additionally, data scientists can now collaborate and rapidly […]

The 2014 Yahoo email hack is a good illustration how a big data security analytics platform such as Apache Metron can make it easier to detect, investigate, assess, and remediate threats in your environment.  In this article I will describe how to setup and configure Apache Metron to detect a recent cyber attack on Yahoo, […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we briefly introduced the power of leveraging prepackaged applications in Data Lake 3.0 and how the focus will shift from the platform management to solving the business problems. In this post, we further deliberate on this idea to help answer […]