The Hortonworks Blog

Posts categorized by : Apache Hadoop

Apache Accumulo is gaining momentum in markets such as government, financial services and health care for its enhanced security and performance. Hortonworks has a long history with this technology and has multiple committers to the Accumulo project on staff – at least one of whom literally helped to write the book on Accumulo. This has enabled Hortonworks to provide enterprise support for Accumulo within the Hortonworks Data Platform for some time now.…

On Feb 8th and 9th, Hortonworks, Microsoft and Elastacloud will be hosting a hackathon at the Microsoft Campus in Mountain View, CA. Whether you’re a newbie or ninja, developer or scientist, we’d love to see you there. Register here.

The focus of the hackathon will be city datasets. For instance, we’ll be drawing on datasets from San Francisco that will measure things like:

  • Pedestrian safety: where accidents occur, how they occur and who has caused them.

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…

This guest blog post is from Syncsort, a Hortonworks Technology Partner and certified on HDP 2.0, by Keith Kohl, Director, Product Management, Syncsort (@keithkohl)

Several years ago, Syncsort set on a journey to contribute to the Apache Hadoop projects to open and extend Hadoop, and specifically the MapReduce processing framework.  One of the contributions was to open the sort – both map side sort and reduce side – and to make it pluggable. …

I recently sat down with Devaraj Das and Carter Shanklin to discuss the dramatic improvements delivered in Apache HBase version 0.96 included in HDP 2.0.

Now HBase runs on Windows and (whether on Linux or Windows) it recovers from failures much more quickly, with dramatic improvements in mean time to recovery (MTTR).

Devaraj is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on HBase.…

This guest post from Simon Elliston Ball, Head of Big Data at Red Gate and all round top bloke. 

Hadoop is a great place to keep a lot of data. The data-lake, the data-hub and the data platform;  it’s all about the data. So how do you manage that data? How do you get data in? How do you get results out? How do you get at the logs buried somewhere deep in HDFS?…

Today, we are excited to announce the agenda for Hadoop Summit Europe 2014.  We welcome you to check it out now and hopefully start planning you trip to Amsterdam now!

The call for abstracts for Hadoop Summit Europe was open for just over two months and we received an unbelievable 354 submissions.  Wow!  Further, as we read through them, the quality was amazing.  We quickly surmised that the show was going to be great, but the selection process was going to rough.…

This guest post from Eric Hanson, Principal Software Development Engineer on Microsoft HDInsight, and Apache Hive committer.

Hive has a substantial community of developers behind it, including a few from the Microsoft HDInsight team. We’ve been contributing to the Stinger initiative since it was started early in 2013, and have been contributing to Hadoop since October of 2011. It’s a good time to step back and see the progress that’s been made on Apache Hive since fall of 2012, and ponder what’s ahead.…

I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.

Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0.…

One aspect of community development of Apache Hadoop is the way that everyone working on Hadoop -full time, part time, vendors, users and even some researchers all collaborate together in the open. This developed is based on publicly accessible project tools: Apache Subversion for revision control, Apache Maven for the builds; Jenkins for automating those builds and tests. Central to a lot of work is the Apache JIRA server, an instance of Atlassian’s issue management tool.…

This is the third in our series on modern data architectures across industry verticals. Others in the series are:

Many of the world’s largest telecommunications companies use Hortonworks Data Platform (HDP) to manage their data. Through partnership with these companies, we have learned how our customers use HDP to improve customer satisfaction, make better infrastructure investments and develop new products.…

Whether you were busy finishing up last minute Christmas shopping or just taking time off for the holidays, you might have missed that Hortonworks released the Stinger Phase 3 Technical Preview back in December. The Stinger Initiative is Hortonworks’ open roadmap to making Hive 100x faster while adding standard SQL. Here we’ll discuss 3 great reasons to give Stinger Phase 3 Preview a try to start off the new year.

Reason 1: It’s The Fastest Hive Yet

Whether you want to process more data or lower your time-to-insight, the benefits of a faster Hive speak for themselves.…

Hadoop has traditionally been used for batch processing data at large scale. Batch processing applications care more about raw sequential throughput than low-latency and hence the existing HDFS model where all attached storages are assumed to be spinning disks has worked well.

There is an increasing interest in using Hadoop for interactive query processing e.g. via Hive. Another class of applications makes use of random IO patterns e.g. HBase. Either class of application benefits from lower latency storage media.…

The year is coming to its end. Maybe you’re reading this as you race to check a few more 2013 items off of your to-do list (at work or at home). Or maybe you’ve already got a hot toddy in your hand and your feet kicked up, with slippers warming your toes.

In 2013, I have been fortunate enough to spend the year speaking with our customers and I learned about how so many important organizations are using Apache Hadoop and Hortonworks Data Platform (HDP) to solve real problems.…

The network and security teams at your company do not allow internet access from the machines where you plan to install Hadoop. What do you do? How do you install your Hadoop cluster without having access to the public software packages? Apache Ambari supports local repositories and in this post we’ll look at the configuration needed for that support.

When installing Hadoop with Ambari, there are three repositories at play: one for Ambari – which primarily hosts the Ambari Server and Ambari Agent packages) and two repositories for the Hortonworks Data Platform – which hosts the HDP Hadoop Stack packages and other related utilities.…

Go to page:« First...7891011...20...Last »