The Hortonworks Blog

More from Jim Walker

Four years ago, Arun Murthy entered a JIRA ticket (MAPREDUCE -279) that outlined a re-architecture of the original MapReduce.  In the ticket, he outlined a set of capabilities that allowed processes to better share resources and an architecture that would allow Hadoop to extend beyond batch data processing.

It turned out that this ticket was prescient of true enterprise requirements for Hadoop. As enterprise adoption accelerated, it became even clearer that multiple processing models – moving beyond batch – was critical for Hadoop to broaden its applicability for mainstream usage in the modern enterprise architecture.…

What is the value of Hadoop to your business? What value lies in your big data?

There are a MANY definitions of big data out there.  In fact, we have published two of them to our blog alone and I am sure we can dream up of a few more.  However, when it comes down to it, our customers know best.  After all, they are the users of Hadoop.

New Whitepaper: “Business Value of Hadoop”.…

Over the past year, customers have told us they want to store all their data in one place and interact with it in multiple ways… they want to use Hadoop, but in order to do so, it needs to extend beyond batch.  It also needs to be interactive and real-time (among others).

This is the entire principle behind YARN, which together with others in the community, Arun Murthy and the team at Hortonworks have been working on for more than 5 years! …

Talend Open Studio for Big Data provides an intuitive set of tools that make dealing with data in the Hadoop world (and Hortonworks Data Platform in particular) a lot easier.  We often use the tools often to speed delivery of a proof of concept or to operationalize movement of data from sources like web logs and machine sensors to load HDFS.  It is simple to use and typically takes only minutes to perform something that once took hours in a script.…

A few weeks back we posted a definition of “big data”.  There was definitely some internal conversation about the term and if this definition had captured what the term means.  Sum finding: it is a loaded term.  It means a lot of different things to a lot of different people.

When I first joined Hortonworks, I bought in to the three V’s (volume velocity and variety) definition of big data. …

PORTLAND – The Rose city is a great place and this week it got even more interesting with the OpenStack Summit in town. I am more a data geek and very rarely do I venture down the stack into infrastructure, but wow, there is something cool going on with the OpenStack community.  I couldn’t help but to get wrapped up in the excitement.  Not only was the enthusiasm palpable, it was also very familiar.…

While we are quite a far way away from hearing “Houston, tranquility base here… the eagle has landed”, the HP moonshot is definitely pushing us all toward a new class of infrastructure to run more efficient workloads, like Apache Hadoop. Hortonworks applauds the development of flexible Big Data appliances like Moonshot. We are excited about this development as it signals alignment across development, operations and infrastructure within organizations.  For quite some time, our team has been accustomed to a natural balance required across these three constituents and now the server the market is joining in on the game.…

Unstructured data, semi-structured data, structured data… it is all very interesting and we are in conversations about big and small versions of each of these data types every day. We love it…  we are data geeks at Hortonworks. We passionately understand that if you want to use any piece of data for some computation, there needs to be some layer of metadata and structure to interact with it.  Within Hadoop, this critical metadata service is provided by HCatalog.…

“OK, Hadoop is pretty cool, but exactly where does it fit and how are other people using it?”  Here at Hortonworks, this has got to be the most common question we get from the community… well that and “what is the airspeed velocity of an unladen swallow?”

We think about this (where Hadoop fits) a lot and have gathered a fair amount of expertise on the topic.  The core team at Hortonworks includes the original architects, developers and operators of Apache Hadoop and its use at Yahoo, and through this experience and working within the larger community they have been privileged to see Hadoop emerge as the technological underpinning for so many big data projects.…

Thankful…

Happy Thanksgiving!

Today, like the rest of the U.S., we take a pause from our regular blog schedule to give thanks…

We are thankful for mappers and reducers. We are thankful for namenodes and jobtrackers. We give thanks to speculative execution battling the march of the last reducer. Give thanks to every petabyte, terabyte, gigabyte, file and block of data. We are thankful for the capacity scheduler.

We are very thankful for many things here at Hortonworks and I know many of us are thankful for an extra long weekend.…

As we speed towards wide spread enterprise adoption of Apache Hadoop, it has become readily apparent that this new data platform must not only capture, process and distribute data, but it also must be able to be deployed in a variety of ways, be it on premise, in a VM, as an appliance or better yet in the cloud…

Today we announced a new relationship with Rackspace in which we will develop an OpenStack based Hadoop solution for the public and private cloud.…

Today our partner, Teradata, announced availability of the Teradata Aster Big Analytics Appliance, which packages our Hortonworks Data Platform (HDP) with Teradata Aster on machine that is ready to plug-in and bring big data value in hours.

There is more to this appliance than meets the eye…  it is not just a simple packaging of software on hardware. Teradata and Hortonworks engineers have been working together for months tying our solutions together and optimizing them for an appliance.…

I spent some time at the first ever DataWeek in San Francisco last week.  It is a brand new show and it was very well-run, spread across a few cool spaces with an interesting mix of novice to experienced data professionals.  They had a good blend of labs, speakers, panels and great networking opportunities.  In all, it was great and a big thanks and kudos to the organizers.

I took part in a panel and also presented a three-hour overview of Hadoop. …

Hortonworks Data Platform 1.1 Brings Expanded High Availability and Streaming Data Capture, Easier Integration with Existing Tools to Improve Enterprise Reliability and Performance of Apache Hadoop

It is exactly three months to the day that Hortonworks Data Platform version 1.0 was announced. A lot has happened since that day…

  • Our distribution has been downloaded by thousands and is delivering big value to organizations throughout the world,
  • Hadoop Summit gathered over 2200 Hadoop enthusiasts into the San Jose Convention Center,
  • And, our Hortonworks team grew by leaps and bounds!

This is the first part of a series written by Charles Boicey from the UC Irvine Medical Center.  The series will demonstrate a real case study for Apache Hadoop in healthcare and also journal the architecture and technical considerations presented during implementation.

With a single observation in early 2011, the Hadoop strategy at UC Irvine Medical Center started. While using Twitter, Facebook, LinkedIn and Yahoo we came to the conclusion that healthcare data although domain specific is structurally not much different than a tweet, Facebook posting or LinkedIn profile and that the environment powering these applications should be able to do the same with healthcare data.…

Go to page:123