Posts by Marc Holmes:


Week in Review: SQL IN Hadoop and Hive, Beyond Batch with YARN, NFS access to HDFS and HBase MTTR

Or as it’s more commonly being called: Week-ish in Review. Let’s recap on the latest – there’s some juicy technology goodness here.

Delivering on Stinger: Phase 1Just this week, Hive 0.11 has been released. Owen (@owen_omalley) brought us the news that 55 – yes, fifty-five – developers from across the community have addressed 386 JIRA tickets and have delivered significant improvements to Hive along with an awesome demonstration of the power of community open-source development. Thanks to everyone! This release of Hive means that we’ve delivered on the first phase of the Stinger Initiative too – aiming to deliver 100x performance increases to Hive.

Taking Hadoop Beyond Batch with YARN. All of which means we step closer to delivering SQL-in-Hadoop and respond to the needs of enterprises for multi-application operating systems for their big data. Arun (@arunmurthy) gives a terrific update on Hadoop 2.0 and YARN and how that development will move Hadoop Beyond Batch. Stay tuned!

Delivering Enterprise Hadoop through MTTR for HBase and NFS access to HDFS. Meanwhile, Nicolas Liochon (@nkeywal) and Devaraj Das (@ddraj) provide an introduction on how HBase availability is being improved through work on Mean Time To Recover (MTTR) capabilities. And then Brandon Li (@brandonli11) and Suresh Srinivas (@suresh_m_s) updated us on progress to simplify data management through NFS access to HDFS. All critical stuff for the enterprise, and all driven through the community.

Microsoft love for .NET Hadoop fans. If you’re a .NET developer and have been missing out on a little Hadoop fun, then Microsoft has started pushing out SDKs and tutorials for its Hadoop-in-the-Cloud service – HDInsight – so you can fire up Visual Studio and get rocking on that big data.

Hadoop Summit Meetups. We only announced them this week, and they’re nearly full already. Still time to try and squeeze into one of the Meetups: Hive, Pig, HBase, YARN, Accumulo, Ambari, Oozie, Data Science and Architecture or maybe attend Big Data Camp or Machine Learning Evening on 25th June as part of Hadoop Summit.

Now it’s time to go play. Have a great weekend.

Hadoop SDK and Tutorials for Microsoft .NET Developers

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

  • Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.
  • HDInsight Labs Preview. Up on Github, there is a series of 5 labs covering C#, JavaScript and F# coding for MapReduce jobs, using Hive, and then bringing that data into Excel. It also covers some Mahout use to build a recommendation engine.
  • Microsoft Hive ODBC Driver. The examples above use this preview driver to enable the connection from Hive to Excel.

If all of the above excites you our Hadoop on Windows for Developers training course also similar content in a lot of depth.

You can read more about the partnership between Hortonworks and Microsoft here, and you can download a preview of HDP for Windows here, or sign up for HDInsight over here. And if you’re hungry for more Hadoop tutorials, grab our own Hortonworks Sandbox here.

Meetups at Hadoop Summit

Banner728x90

UPDATED: To include the Oozie meetup.

The main Hadoop Summit agenda is looking awesome – go take a look here, and register here - but there’s also a series of meetups planned for the day before the general sessions. If you want to get up close and personal on topics of interest to you with other like-minded folk then take a look at these options. We’ll be providing refreshments along the way.

Meetups

You should go ahead and register at the links below, note that space will be limited and remember all meet ups are in San Jose!

Morning Sessions: 25th June, 10:00am – 12:30pm at San Jose Convention Center

Afternoon Sessions: 25th June, 1:30pm – 4:00pm at San Jose Convention Center

Camps

Additionally, there are two camps in the evening:

All this Hadoop-y goodness should get you nicely in the mood for the next two days of general and track sessions. See you there!

Week in Review: Hadoop Summit, Value of Big Data, and more Ambari

And we are just about done with this week. But not quite – dig into the conversation from the past few days.

Hadoop Summit. We published the vast majority of sessions (70 so far) for the Hadoop Summit in San Jose, 26-27 June. The sessions stretch across 7 tracks from Architecture to Economics and we hope you can join us for THE Hadoop community event of the year. You can register here, and the schedule is here.

Big Data Defined Part Deux: Value Definition. Jim picked up from the last Big Data definition and talked about it here. Regardless of your views on volume, variety and velocity there is one V to rule them all: Value.

Enterprise Data Analytics with Hortonworks and Datameer. I’ve been having a ton of fun with Datameer visualizations this week. If you want to learn a little more about enterprise analytics and how to better unlock the insights in your own data (with cool graphics) then take a look here.

Get Started with Ambari. We published a fun tutorial on setting up Ambari to provision, manage and monitor your Hadoop cluster. Better automation of management and monitoring means more time in the garden.

Until next week – stay frosty.

Hortonworks at Yahoo! Hack Europe

IMG_0549Some news from the UK as Yahoo! Hack Europe welcomed Hortonworks this past weekend in central London.  This two-day event sponsored by Yahoo! was focused on celebrating collaboration, learning and innovation using the worlds leading technologies.  Chris Harris, our local EMEA Solution Engineer was on hand to add to the discussions.  Partnering with Microsoft, we were able to showcase our HDP on the Azure platform.  This was a fantastic opportunity for the 350 delegates to be expose to both Azure and enterprise ready Hadoop provided as HDInsight Service.

After an appearance of the Yahoo bigger than life, Hack Robot (seriously, check it out…), who made sure that everyone was entertained, the hack started with vengeance.  Hyped up on the sweetie cart full of everyone’s favorites, most delegates were now officially up for the challenge.  Inspired by the passion, Chris lead a thought provoking workshop, where a number of the hackers were able to try out real life scenarios on how Hadoop as part of the HDInsight service can and will be impacting business decisions.  After partaking in a few more of the free donuts and sandwiches, a few more questions answered and a number of people inspired, Chris finally left the hackers to enjoy the rest of their weekend.  Congratulations to everyone who took part and the winners!  From what we gather the whole weekend was a grand success and we look forward to working with them on the next one and possibly seeing you there!

Chris’ decks can be found on Slideshare, and we’ve embedded them below too. Our thanks to everyone who attended!

Hadoop Summit Schedule is now available!

Banner728x90

Now is the time to get registered for the Hadoop Summit in San Jose, 26-27 June, 2013 – we’d love to see you there. A few weeks ago, we revealed the selectees from the community choice voting, and we’re now delighted to announce the full schedule of sessions is available here.

Session Schedule

Our thanks to the track selection committees and track chairs for the work on building a great schedule for an awesome event. There are 70 sessions on the schedule so far with more to come later.

This year, the tracks are as follows:

  • Enterprise Data Architecture. This track focuses on Hadoop as a data platform and how it fits within broader enterprise data architectures.
  • Applications and Data Science. Sessions in this track focus on the practice of data science using Hadoop.
  • Deployment and Operations. This track focuses on the deployment, operation and administration of Hadoop clusters at scale.
  • Hadoop-driven business / Business Intelligence. Sessions in this track focus on how Hadoop is powering a new generation of business intelligence solutions.
  • Hadoop (Disruptive) Economics. Sessions in this track are focussed on business innovation and enablement of business to become data-driven.
  • Reference Architectures. Sessions in this track focus on how the various components of the enterprise ecosystem integrate and interoperate with Apache Hadoop.
  • Future of Apache Hadoop. This track takes a technical look at the key projects and research efforts driving innovation in and around the Hadoop platform.

Training and more

You can take advantage of a series of training opportunities at the summit – click the links to register:

Also around the general and track sessions are a number of other activities: from birds of a feather sessions to a Big Data Science Meetup and a Big Data Camp.

We hope to see you in San Jose. You can register over here.

Week in Review: OpenStack, Data Science and Ambari

Almost time to spend a relaxing weekend in the garden, or crushing some data in your garage-based homebrew Hadoop cluster – whichever you prefer. But before we choose our path, let’s take a look at the last two weeks of happenings (I was lost in Oregon last week).

Hadoop is the perfect app for OpenStack. While I was struggling with driving directions, Red Hat, Marantis and Hortonworks were announcing plans for Project Savanna which aims to automate the deployment of Hadoop on enterprise-class OpenStack-powered clouds. Jim also wrote up some comprehensive notes from the awesome OpenStack Summit event.

Need Data Science? Here’s how to build a team. Ofer followed up his post on 4 Reasons to use Hadoop for Data Science post with some thinking on the continuum of skills and roles that represent a data science team. This proved to be something of a hot topic, and was referenced amongst some collective thinking on GigaOM. In a subsequent post, he also dived a little deeper into Data Agility.

 

Managing Hadoop? Some field notes from the first Apache Ambari Meetup. This inaugural meetup at our office was well attended with some great discussion, and we published the presentations and recordings over here.

 

Data Warehouse? Hadoop? When to use Which. In an interview as a backdrop to a Teradata-hosted webinar: Hadoop & the Enterprise Data Warehouse: When to Use Which, Chad Meley, Eric Baldeschwieler and Stephen Brobst talk about their experiences with both as the  It’s on April 30th, so still time to register.

Considering deploying a Hadoop cluster? OK, so a Hadoop cluster sounds like an awesome idea – but what are the things you should consider in building that infrastructure. This checklist from HP maybe useful for your planning.

And finally some stuff to do:

Have a great weekend!

Week in Review: Patterns, Glue and Moonshots

The end of another action-packed week and just before we all head off for the weekend then let’s have a recap on the conversations from this week – we hope you’re enjoying them.

We’re delighted by the response to our Hadoop Patterns of Use whitepaper and presentation - that really seems to have struck a chord with everyone thinking about what Hadoop can really do for their business. You can see that content just below here – an excellent read for the journey home.

Thumbnail

Also popular was the slides from one of our resident data scientists, Ofer Mendelevitch, who had 4 great reasons to use Hadoop for data science. He’ll be mining for more right now. Another article we liked from Stratconf explained the importance of imagination in data science.

 

Mid-week, we turned our attention to the awesomeness of HCatalog and spent a little time geeking out on the capabilities it provides as the glue for all your data. We also got a little bit excited about the HP Moonshot announcement - we love the idea of an appliance that can enable 1800 nodes in a single chassis. Wow.

But wait there’s more… Justin published the 2nd in a series of guest posts from Charles Boicey on a real-life implementation of Hadoop to improve patient monitoring in healthcare. And sneaking in at the end of last week we looked at the reality of integrating SAP and Hortonworks Data Platform.

And technically, we saw some interesting articles:

Enough to keep you going until next week? OK, one more then… Cheryle offered some great advice on things you can do in the Sandbox to boost your skills. Go on, get stuck in.

Where are Hortonworkers? Events and Meetups 8th April to 22nd April

Hortonworkers are out there – here is a rundown of events and meet ups we’ll be at in the next couple of weeks and we hope we’ll see you there. Did we miss any? Want us to attend your event? Let us know!

Big Data Innovation Summit

April 10-11, 2013, San Francisco, CA

http://theinnovationenterprise.com/summits/big-data-innovation-summit-april-2013-san-francisco

Spring into April and jump into Big Data! Be sure to meet us at Big Data Innovation Summit by the bay. We’re excited to have Alan Gates, co-founder of Hortonworks, presents on a couple of really exciting talks and we hope you can join us.

  •  April 11 @9:30am: Coordinating the Many Tools of Big Data in Hadoop
  •  April 11 @ 12:30pm: Hadoop Now, Next and Beyond
  •  April 11 @ 2:00pm: Roundtable Session: Use Case Patterns: Horizontal or Vertical

As a global sponsor, we’ll also be exhibiting. Look for us in the exhibit area and meet members of the Hortonworks team, who will be happy to discuss any questions you have on Hadoop and Hortonworks.

PASS Business Analytics Conference

April 10-12, 2013, Chicago, IL

http://www.passbaconference.com – booth S5

We’re excited to participate in the first PASS BA or Business Analytics community driven event. We will be speaking at three sessions: “Why Apache Hadoop for Data Science”, “The Future of Apache Hive and Hadoop 2.0”, and “Big Data: Threat or Opportunity?”

Teradata Universe Copenhagen 2013

April 14-17, 2013, Copenhagen, Denmark

http://www.teradataemea.com/

We’re delighted to be a Platinum sponsor at Teradata Universe. The conference gathers experts from internationally recognized companies and presenters from Teradata’s customer community to deliver insights on new trends driving the industry on how Big Data Analytics are used to drive business value.

Chris Harris, Solutions Engineer at Hortonworks, will be speaking at the Solution Showcase on “Big Data: Making Sense of it all!” on Monday April 15 at 12:40 and Tuesday April 16 at 11:20.

More on the Hortonworks / Teradata partnership can be found at www.hortonworks.com/teradata

eMetrics Summit

April 14-18 2013, San Francisco

http://www.emetrics.org/sanfrancisco/2013/

Hortonworks VP Products, Bob Page, will be speaking at two sessions at this analytics event.

OpenStack

April 15-18, 2013, Portland, Oregon

http://openstacksummitapril2013.sched.org/

We’re heading to our very first OpenStack Summit to talk about all things Apache Hadoop on OpenStack and we would love to meet you! A cloud deployment model makes perfect sense for Hadoop, which (a) allows for efficient infrastructure usage and (b) is a net new workload for most organizations (awesome…far fewer legacy considerations).  So Hadoop + OpenStack seems like a logical fit.  If your organization is interested in combining these two mega technology trends, it would be great to connect with our team who can share what others are doing!

There are many ways to meet the Hortonworks team!
We’ll be speaking:

And we’re exhibiting! Come by our Hortonworks booth, say hello, geek out to Hadoop and Big Data and pick up an awesome swag while you’re at it!

Charlotte Hadoop Users Group, 11th April 2013

http://www.meetup.com/CharlotteHUG/

Terry Padgett will present on the Stinger Initiative, Tez and Knox

Bay Area HUG, 17th April 2013

http://www.meetup.com/hadoop/events/63737062/

Owen O’Malley will present on the Stinger Initiative

Chicago HUG, 22nd April 2013

http://www.meetup.com/Chicago-area-Hadoop-User-Group-CHUG/events/106391622/

George Vetticaden will present on the Stinger Initiative, Tez and Knox.

Week in Review: Falcon, Hadoop Momentum and BFFs Forever!

More of a 2 weeks in review this time around owing to the Easter break. So what’s been happening?

Falcon bringing Data Lifecycle Management for Hadoop. The big news this week was the newly approved Apache Software Foundation incubator project – Falcon. The project was initiated by the team at InMobi and engineers from Hortonworks towers with the intent of simplifying data management through a data lifecycle management framework. Something for everyone then. More on Falcon here. Once again, it’s a great example of community driven open source driving the innovation that matters, or as Mohit Saxena of InMobi said:

fal1

Want to be BFFs with Hortonworks? According to this article on TechWorld, everyone does, and Neustar details why. We’re flattered by the sentiment and we’d love to be your friend. You can ‘Like’ us over here.

Market Momentum. So, with all of the innovation and buzz around Hadoop and Hortonworks, what does that mean for you, me, or anyone looking to dip a toe in the water? This post highlighted the market momentum and the surrounding skills and jobs and how you can get involved. I recommend you start by grabbing a copy of the Sandbox and take advantage of this graph…

Hadoop Summit Keynotes and Sessions. As memories of Amsterdam glow in the mind, the content from the event began to flow, and you can now view the videos and slides of keynotes and sessions on the summit site. We also announced the selectees of the community choice section of Hadoop Summit North America in San Jose, and the panels are now hard at work selecting the remaining sessions. You have registered haven’t you?

 

And finally, can you define Big Data? I guess that depends on your individual perspective. In this short piece, Russell describes Big Data it in terms of transformative economics. Something to chew on until next week.

Have a great weekend!

Keynotes from Hadoop Summit Amsterdam 2013

The slides and videos from Hadoop Summit in Amsterdam have begun to flow so you can enjoy the sessions.

Whilst you’re thinking about which sessions to watch and read, then we suggest taking a look at the keynotes for the event:
  • What is the point of Hadoop? (VideoSlides)
  • Matt Aslett, Research Director, Data Management and Analytics, 451 Research
  • Real-World insight into Hadoop in the Enterprise (Video)
  • Panel featuring HSBC, eBay, Neustar and More
We hope you enjoy these sessions, and the content from the tracks. Let us know in the comments! And don’t forget that there is plenty of time to register for Hadoop Summit San Jose 2013.

Hadoop Market Momentum and You

On 27th March, the Wall Street Journal published an article ‘VCs Bet Big Bucks on Hadoop’ and it seems clear that the market is going to be huge. But what does that mean to you and your personal skills investment? Here’s our view:

Hadoop is HOT

Hadoop is incredibly hot right now as the number of available jobs continues to grow enormously (hey – we even have a bunch of our own right here).

Indeed’s Job Trends shows Hadoop as 7th hottest skill and it’s in great company alongside those app development skills such as iOS, Android and jQuery. I guess that’s to be expected of course: insights from big data is the fuel to smartest apps of the future.

The Hadoop trend itself is fairly clear. In growth terms, that is pretty explosive!

Indeed Job Trends

 

A quick search on LinkedIn will pull back around 1200 Hadoop jobs right now (it was 1281 when I checked). And you can also look at the Skills page to see the associated set of component technologies and their relative growth.

Hortonworks is HOT

Apart from the WSJ, just last week, MomentumIndex called out Hortonworks as the 2011 Startup with the most Momentum from a pool of 900 startups being tracked from that year.

We also know when we talk to customers that they’re excited about our approach to pure, community-driven, open source Hadoop. We know developers are excited to get hands on with Hadoop via the Sandbox. And we say great public responses like those we saw at Hadoop Summit Amsterdam, that our approach is the right one.

Hadoop, Hortonworks and YOU are HOT

Hortonworks believes in Hadoop and we believe in the power of community-driven open source. We know that this is just the beginning for Hadoop and we back everyone investing their skills in Hadoop, and taking this journey with us. All the way.

Get Started: You can get started by downloading our Sandbox - it’s a VM package containing everything you need to run a single node cluster (I love that expression!) and is packed with tutorials and demos.

Get Connected: Stay in touch. When we say community we mean it – come follow us on TwitterFacebookLinkedIn- we want to hear from you as to how we’re doing to provide you with the tools and capabilities to do what your business is demanding. Find a Hadoop User Group (HUG), and come along to the Hadoop Summit.

Get Certified: If you want to differentiate yourself and grab one of those jobs, then you can train and certify with us too. All of the details on that are here.

Dive in and enjoy.

Hadoop Summit North America 2013: Community Choice Results

And the voting is over and the results are in for the Community Choice program of the Hadoop Summit San Jose 2013.

With over 300 sessions, and around 6000 users casting more than 15000 votes there was a lot of excitement to participate and influence the results - thanks to everyone for your contribution. At the end of the process, the selectees are:

  • Application and Data Science Track: Watching Pigs Fly with the Netflix Hadoop Toolkit (Netflix)
  • Deployment and Operations Track: Continuous Integration for the Applications on top of Hadoop (Yahoo!)
  • Enterprise Data Architecture Track: Next Generation Analytics: A Reference Architecture (Mu Sigma)
  • Future of Apache Hadoop Track: Jubatus: Real-time and Highly-scalable Machine Learning Platform (Preferred Infrastructure, Inc.)
  • Hadoop (Disruptive) Economics Track: Move to Hadoop, Go Fast and Save Millions: Mainframe Legacy Modernization (Sears Holding Corp.)
  • Hadoop-driven Business / BI Track: Big Data, Easy BI (Yahoo!)
  • Reference Architecture Track: Genie – Hadoop Platformed as a Service at Netflix (Netflix)

Congratulations to the selectees for each track, and a further honorable mention to Sears for winning the ‘Longest Session Title So Far’ which was a surprisingly hard fought contest!

The content selection committee will now be working hard to select the remaining sessions for the tracks, and we’ll cover those participants in more depth later.

With the Community Choice program complete we’re one step closer to a great event! Thanks again to everyone for taking part and stand by for more updates.

Week in Review: Sandboxes, HDP 2.0 Alpha 2, Hive Performance and Summits

Hadoop Summit It’s almost time for that final drive home of the week, and what a week it has been with a few new releases, a summit, and a little bit of technical fun. Here’s what happened:

New Sandbox Release. Yes, your favorite Hadoop VM image just got even better. Cheryle took us through the new features which included Ambari integration and Russell followed up with a quick tour of Ambari. There’s still plenty of time to download Sandbox for a weekend of data crunching fun.

HDP 2.0 Alpha 2 was released. This preview release demonstrates some of the performance improvements in store for the final HDP 2.0 release via YARN, enhancements to Hive per the Stinger Initiative, and Apache Tez. Just before the release, we posted some early test results which showed a 45X (yes, that’s forty five) performance improvement for Hive interactive queries. But that’s just the beginning as we push to 100X, and Microsoft also talked about their contributions to the Stinger Initiative with the same aim in mind.

If you’ve downloaded Sandbox and are looking for some inspiration for a little fun, then Russell also posted a two part series on extracting, loading, querying and analyzing your own Twitter archive with Hive. Part 1 is here, and Part 2 is here.

And finally, there was just the small matter of the Hadoop Summit in AmsterdamWe had a great time and hope you did too. Thank you for attending, contributing to the conversation and supporting Hadoop. If you’re now really excited to learn Hadoop, we posted about available training we have in Europe and Palo Alto.

And that was the week that was. Has your Sandbox downloaded yet?

Go to page:12