The Hortonworks Blog

Apache Hadoop 2.0 continues to make its way through the open source community process at the Apache Software Foundation and is getting closer to being declared “ready” from a community development perspective.  Once ready, our team at Hortonworks will apply our usual enterprise rigor in providing a tested and integrated distribution that includes Hadoop 2.0 along with the other enterprise-focused services our customers and partners require.

In my roles both at Hortonworks and in the open-source Apache Hadoop community, I’m asked a lot of questions regarding the key aspects and motivations behind Hadoop 2.0.…

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

  • Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive.

UPDATED 6/12: To include the Falcon meetup.

UPDATED: To include the Oozie meetup.

The main Hadoop Summit agenda is looking awesome – go take a look here, and register here - but there’s also a series of meetups planned for the day before the general sessions. If you want to get up close and personal on topics of interest to you with other like-minded folk then take a look at these options.…

We are excited that another critical Enterprise Hadoop integration requirement – NFS Gateway access to HDFS – is making progress through the main Apache Hadoop trunk.  This effort is architected and designed by Brandon Li and Suresh Srinivas, and is being delivered by the community. You can track progress in Apache JIRA HDFS-4750.

With NFS access to HDFS, you can mount the HDFS cluster as a volume on client machines and have native command line, scripts or file explorer UI to view HDFS files and load data into HDFS.  …

The following post is from Nicolas Liochon and Devaraj Das with thanks to all members of the HBase team.

HBase is an always-available service and remains available in the face of machine failures and rack failures. Machines in the cluster runs RegionServer daemons. When a RegionServer crashes or the machine goes offline, the regions it was hosting goes offline as well. The focus of the MTTR work in HBase is to be able to detect abnormalities and to be able to restore access to (failed) offlined regions as early as possible.…

And we are just about done with this week. But not quite – dig into the conversation from the past few days.

Hadoop Summit. We published the vast majority of sessions (70 so far) for the Hadoop Summit in San Jose, 26-27 June. The sessions stretch across 7 tracks from Architecture to Economics and we hope you can join us for THE Hadoop community event of the year. You can register here, and the schedule is here.…

Today, 94% of Hadoop users perform analytics on large volumes of data that were not possible before. How do they do it? Cool applications, that’s how.

You have seen various stats that indicate enterprises need better ways of making use of data but they bear repeating: The volume of business data worldwide, across all companies, doubles every 1.2 years, according to a study published by eBay in May, 2012. And market research firm IDC released a forecast showing the big data market may grow from $3.2 billion in 2010 to $16.9 billion in 2015.…

Some news from the UK as Yahoo! Hack Europe welcomed Hortonworks this past weekend in central London.  This two-day event sponsored by Yahoo! was focused on celebrating collaboration, learning and innovation using the worlds leading technologies.  Chris Harris, our local EMEA Solution Engineer was on hand to add to the discussions.  Partnering with Microsoft, we were able to showcase our HDP on the Azure platform.  This was a fantastic opportunity for the 350 delegates to be expose to both Azure and enterprise ready Hadoop provided as HDInsight Service.…

Now is the time to get registered for the Hadoop Summit in San Jose, 26-27 June, 2013 – we’d love to see you there. A few weeks ago, we revealed the selectees from the community choice voting, and we’re now delighted to announce the full schedule of sessions is available here.

Session Schedule

Our thanks to the track selection committees and track chairs for the work on building a great schedule for an awesome event.…

A few weeks back we posted a definition of “big data”.  There was definitely some internal conversation about the term and if this definition had captured what the term means.  Sum finding: it is a loaded term.  It means a lot of different things to a lot of different people.

When I first joined Hortonworks, I bought in to the three V’s (volume velocity and variety) definition of big data. …

Almost time to spend a relaxing weekend in the garden, or crushing some data in your garage-based homebrew Hadoop cluster – whichever you prefer. But before we choose our path, let’s take a look at the last two weeks of happenings (I was lost in Oregon last week).

Hadoop is the perfect app for OpenStack. While I was struggling with driving directions, Red Hat, Marantis and Hortonworks were announcing plans for Project Savanna which aims to automate the deployment of Hadoop on enterprise-class OpenStack-powered clouds. …

To deploy, configure, manage and scale Hadoop clusters in a way that optimizes performance and resource utilization there is a lot to consider. Here are  6 key things to think about as part of your planning:

  • Operating system:  Using a 64-bit operating system helps to avoid constraining the amount of memory that can be used on worker nodes. For example, 64-bit Red Hat Enterprise Linux 6.1 or greater is often preferred, due to better ecosystem support, more comprehensive functionality for components such as RAID controllers.…
  • As a preview to the April 30th webinar: Hadoop & the EDW: When to Use Which, Chad Meley, Global Director of Marketing at Teradata, interviewed the two luminary speakers, Eric Baldeschwieler (aka “eric14”) and Stephen Brobst, about the purpose of their presentation and what you can expect to take away from their shared experiences.

    Chad:  “Eric, in this webinar you’re going to talk about the strategic role of relational big data technologies, which have come under fire in some circles with the rise of Hadoop. …

    PORTLAND – The Rose city is a great place and this week it got even more interesting with the OpenStack Summit in town. I am more a data geek and very rarely do I venture down the stack into infrastructure, but wow, there is something cool going on with the OpenStack community.  I couldn’t help but to get wrapped up in the excitement.  Not only was the enthusiasm palpable, it was also very familiar.…

    In a recent blog post I mentioned the 4 reasons for using Hadoop for data science. In this blog post I would like to dive deeper into the last of these reasons: data agility.

    In most existing data architectures, based on relational database systems, the data schema is of central importance, and needs to be designed and maintained carefully over the lifetime of the project. Furthermore, whatever data fits into the schema will be stored, and everything else typically gets ignored and lost.…

    Go to page:« First...10...1718192021...30...Last »

    Thank you for subscribing!