The Hortonworks Blog

More from Jim Walker

By now, you’re probably well aware of what Hadoop does:  low-cost processing of huge amounts of data. But more importantly, what can Hadoop do for you?

We work with many customers across many industries with many different specific data challenges, but in talking to so many customers, we are also able to see patterns emerge on certain types of data and the value that could bring to a business.

We love to share these kinds of insights, so we built a series of video tutorials covering some of those scenarios:

Some more detailed discussion of these types of data is in our ‘Business Value of Hadoop’ whitepaper.…

Four years ago, Arun Murthy entered a JIRA ticket (MAPREDUCE -279) that outlined a re-architecture of the original MapReduce.  In the ticket, he outlined a set of capabilities that allowed processes to better share resources and an architecture that would allow Hadoop to extend beyond batch data processing.

It turned out that this ticket was prescient of true enterprise requirements for Hadoop. As enterprise adoption accelerated, it became even clearer that multiple processing models – moving beyond batch – was critical for Hadoop to broaden its applicability for mainstream usage in the modern enterprise architecture.…

What is the value of Hadoop to your business? What value lies in your big data?

There are a MANY definitions of big data out there.  In fact, we have published two of them to our blog alone and I am sure we can dream up of a few more.  However, when it comes down to it, our customers know best.  After all, they are the users of Hadoop.

New Whitepaper: “Business Value of Hadoop”.…

Over the past year, customers have told us they want to store all their data in one place and interact with it in multiple ways… they want to use Hadoop, but in order to do so, it needs to extend beyond batch.  It also needs to be interactive and real-time (among others).

This is the entire principle behind YARN, which together with others in the community, Arun Murthy and the team at Hortonworks have been working on for more than 5 years! …

Talend Open Studio for Big Data provides an intuitive set of tools that make dealing with data in the Hadoop world (and Hortonworks Data Platform in particular) a lot easier.  We often use the tools often to speed delivery of a proof of concept or to operationalize movement of data from sources like web logs and machine sensors to load HDFS.  It is simple to use and typically takes only minutes to perform something that once took hours in a script.…

A few weeks back we posted a definition of “big data”.  There was definitely some internal conversation about the term and if this definition had captured what the term means.  Sum finding: it is a loaded term.  It means a lot of different things to a lot of different people.

When I first joined Hortonworks, I bought in to the three V’s (volume velocity and variety) definition of big data. …

PORTLAND – The Rose city is a great place and this week it got even more interesting with the OpenStack Summit in town. I am more a data geek and very rarely do I venture down the stack into infrastructure, but wow, there is something cool going on with the OpenStack community.  I couldn’t help but to get wrapped up in the excitement.  Not only was the enthusiasm palpable, it was also very familiar.…

While we are quite a far way away from hearing “Houston, tranquility base here… the eagle has landed”, the HP moonshot is definitely pushing us all toward a new class of infrastructure to run more efficient workloads, like Apache Hadoop. Hortonworks applauds the development of flexible Big Data appliances like Moonshot. We are excited about this development as it signals alignment across development, operations and infrastructure within organizations.  For quite some time, our team has been accustomed to a natural balance required across these three constituents and now the server the market is joining in on the game.…

Unstructured data, semi-structured data, structured data… it is all very interesting and we are in conversations about big and small versions of each of these data types every day. We love it…  we are data geeks at Hortonworks. We passionately understand that if you want to use any piece of data for some computation, there needs to be some layer of metadata and structure to interact with it.  Within Hadoop, this critical metadata service is provided by HCatalog.…

“OK, Hadoop is pretty cool, but exactly where does it fit and how are other people using it?”  Here at Hortonworks, this has got to be the most common question we get from the community… well that and “what is the airspeed velocity of an unladen swallow?”

We think about this (where Hadoop fits) a lot and have gathered a fair amount of expertise on the topic.  The core team at Hortonworks includes the original architects, developers and operators of Apache Hadoop and its use at Yahoo, and through this experience and working within the larger community they have been privileged to see Hadoop emerge as the technological underpinning for so many big data projects.…

Thankful…

Happy Thanksgiving!

Today, like the rest of the U.S., we take a pause from our regular blog schedule to give thanks…

We are thankful for mappers and reducers. We are thankful for namenodes and jobtrackers. We give thanks to speculative execution battling the march of the last reducer. Give thanks to every petabyte, terabyte, gigabyte, file and block of data. We are thankful for the capacity scheduler.

We are very thankful for many things here at Hortonworks and I know many of us are thankful for an extra long weekend.…

As we speed towards wide spread enterprise adoption of Apache Hadoop, it has become readily apparent that this new data platform must not only capture, process and distribute data, but it also must be able to be deployed in a variety of ways, be it on premise, in a VM, as an appliance or better yet in the cloud…

Today we announced a new relationship with Rackspace in which we will develop an OpenStack based Hadoop solution for the public and private cloud.…

Today our partner, Teradata, announced availability of the Teradata Aster Big Analytics Appliance, which packages our Hortonworks Data Platform (HDP) with Teradata Aster on machine that is ready to plug-in and bring big data value in hours.

There is more to this appliance than meets the eye…  it is not just a simple packaging of software on hardware. Teradata and Hortonworks engineers have been working together for months tying our solutions together and optimizing them for an appliance.…

I spent some time at the first ever DataWeek in San Francisco last week.  It is a brand new show and it was very well-run, spread across a few cool spaces with an interesting mix of novice to experienced data professionals.  They had a good blend of labs, speakers, panels and great networking opportunities.  In all, it was great and a big thanks and kudos to the organizers.

I took part in a panel and also presented a three-hour overview of Hadoop. …

Hortonworks Data Platform 1.1 Brings Expanded High Availability and Streaming Data Capture, Easier Integration with Existing Tools to Improve Enterprise Reliability and Performance of Apache Hadoop

It is exactly three months to the day that Hortonworks Data Platform version 1.0 was announced. A lot has happened since that day…

  • Our distribution has been downloaded by thousands and is delivering big value to organizations throughout the world,
  • Hadoop Summit gathered over 2200 Hadoop enthusiasts into the San Jose Convention Center,
  • And, our Hortonworks team grew by leaps and bounds!
Go to page:123