The Hortonworks Blog

Posts categorized by : HDP

On Feb 8th and 9th, Hortonworks, Microsoft and Elastacloud will be hosting a hackathon at the Microsoft Campus in Mountain View, CA. Whether you’re a newbie or ninja, developer or scientist, we’d love to see you there. Register here.

The focus of the hackathon will be city datasets. For instance, we’ll be drawing on datasets from San Francisco that will measure things like:

  • Pedestrian safety: where accidents occur, how they occur and who has caused them.

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…

We’re kicking off 2014 with an evolution to our Modern Data Architecture webinar series. Last year we focused on how your existing technologies integrate with Apache Hadoop. This year we will focus on use cases for how Hadoop and your existing technologies are being used to get real value in the enterprise. Join Hortonworks, along with Microsoft, Actian, Splunk and others as we continue our journey on delivering Apache Hadoop as an Enterprise Data Platform.…

This guest blog post is from Syncsort, a Hortonworks Technology Partner and certified on HDP 2.0, by Keith Kohl, Director, Product Management, Syncsort (@keithkohl)

Several years ago, Syncsort set on a journey to contribute to the Apache Hadoop projects to open and extend Hadoop, and specifically the MapReduce processing framework.  One of the contributions was to open the sort – both map side sort and reduce side – and to make it pluggable. …

I recently sat down with Devaraj Das and Carter Shanklin to discuss the dramatic improvements delivered in Apache HBase version 0.96 included in HDP 2.0.

Now HBase runs on Windows and (whether on Linux or Windows) it recovers from failures much more quickly, with dramatic improvements in mean time to recovery (MTTR).

Devaraj is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on HBase.…

This guest post from Simon Elliston Ball, Head of Big Data at Red Gate and all round top bloke. 

Hadoop is a great place to keep a lot of data. The data-lake, the data-hub and the data platform;  it’s all about the data. So how do you manage that data? How do you get data in? How do you get results out? How do you get at the logs buried somewhere deep in HDFS?…

Microsoft and Hortonworks have been working together for over two years now with the goal of bringing the power of Big Data to a billion people. As a result of that work, today we announced the General Availability of HDP 2.0 for Windows with the full power of YARN.

There are already over half a billion Excel users on this planet.

So, we have put together a short tutorial on the Hortonworks Sandbox where we walk through the end-to-end data pipeline using HDP and Microsoft Excel in the shoes of a data analyst at a financial services firm where she:

  • Cleans and aggregates 10 years of raw stock tick data from NYSE
  • Enriches the data model by looking up additional attributes from Wikipedia
  • Creates an interactive visualization on the model

You can find the tutorial here.…

Installing the Hortonworks Data Platform 2.0 for Windows is straightforward. Lets take a look at how to install a one node cluster on your Windows Server 2012 R2 machine.

To start, download the HDP 2.0 for Windows package. The package is under 1 GB, and will take a few moments to download depending on your internet speed. Documentation for installing a single node instance is located here. This blog post will guide you through that instruction set to get you going with HDP 2.0 for Windows!…

We are excited to announce that the Hortonworks Data Platform 2.0 for Windows is publicly available for download. HDP 2 for Windows is the only Apache Hadoop 2.0 based platform that is certified for production usage on Windows Server 2008 R2 and Windows Server 2012 R2.

With this release, the latest in community innovation on Apache Hadoop is now available across all major Operating Systems. HDP 2.0 provides Hadoop coverage for more than 99% of the enterprises in the world, offering the most flexible deployment options from On-Premise to a variety of cloud solutions.…

This guest post from Eric Hanson, Principal Software Development Engineer on Microsoft HDInsight, and Apache Hive committer.

Hive has a substantial community of developers behind it, including a few from the Microsoft HDInsight team. We’ve been contributing to the Stinger initiative since it was started early in 2013, and have been contributing to Hadoop since October of 2011. It’s a good time to step back and see the progress that’s been made on Apache Hive since fall of 2012, and ponder what’s ahead.…

I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.

Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0.…

One aspect of community development of Apache Hadoop is the way that everyone working on Hadoop -full time, part time, vendors, users and even some researchers all collaborate together in the open. This developed is based on publicly accessible project tools: Apache Subversion for revision control, Apache Maven for the builds; Jenkins for automating those builds and tests. Central to a lot of work is the Apache JIRA server, an instance of Atlassian’s issue management tool.…

This is the third in our series on modern data architectures across industry verticals. Others in the series are:

Many of the world’s largest telecommunications companies use Hortonworks Data Platform (HDP) to manage their data. Through partnership with these companies, we have learned how our customers use HDP to improve customer satisfaction, make better infrastructure investments and develop new products.…

Whether you were busy finishing up last minute Christmas shopping or just taking time off for the holidays, you might have missed that Hortonworks released the Stinger Phase 3 Technical Preview back in December. The Stinger Initiative is Hortonworks’ open roadmap to making Hive 100x faster while adding standard SQL. Here we’ll discuss 3 great reasons to give Stinger Phase 3 Preview a try to start off the new year.

Reason 1: It’s The Fastest Hive Yet

Whether you want to process more data or lower your time-to-insight, the benefits of a faster Hive speak for themselves.…

The year is coming to its end. Maybe you’re reading this as you race to check a few more 2013 items off of your to-do list (at work or at home). Or maybe you’ve already got a hot toddy in your hand and your feet kicked up, with slippers warming your toes.

In 2013, I have been fortunate enough to spend the year speaking with our customers and I learned about how so many important organizations are using Apache Hadoop and Hortonworks Data Platform (HDP) to solve real problems.…

Go to page:« First...45678...Last »