The Hortonworks Blog

In the last Hoya article, we talked about the its Application Architecture. Now let’s talk persistence. A key use case for Hoya is:  support long-lived clusters that can be started and stopped on demand. This lets a user start and stop an HBase cluster when they want, only using CPU and memory resources when they actually need it. For example, a specific MR job could use a private HBase instance as part of its join operations, or for an intermediate store of results in a workflow.…

At Hadoop Summit in June, we introduced a little project we’re working on: Hoya: HBase on YARN. Since then the code has been reworked and is now up on Github. It’s still very raw, and requires some local builds of bits of Hadoop and HBase – but it is there for the interested.

In this article we’re going to look at the architecture, and a bit of the implementation.

We’re not going to look at YARN in this article -for that we have a dedicated section of the Hortonworks site -including sample chapters of Arun Murthy’s forthcoming book.…

If you’re considering the WHY, the HOW and the WHAT of Hadoop and Big Data in your business, then this collection of papers and ebooks is your friend.

  • WHY does Hadoop matter? Our eBook “Disruptive Possibilities of Big Data” paints a picture of the future of the data-driven business and how it changes everything.
  • HOW does Hadoop work in my data architecture? As part of a modern data architecture, Hadoop sits alongside existing infrastructure and augments its capabilities through Refining and Exploring big datasets and ultimately enriching the application and customer experiences for your business.

I’d like to share some thoughts on the recent news that Eric Baldeschwieler has decided to leave Hortonworks. I’d like to start off first by thanking Eric for his contributions to the Hadoop community since its inception over 7 years ago, and I’d like to express my personal appreciation for his help in getting Hortonworks off the ground.

It’s hard to believe it’s been over two years since Hortonworks was founded by over 20 engineers from the original Yahoo!…

After the break in the glorious hot weather we want to banish the rain and thunderstorms and bring back a lazy sunny London, so a few of us decided that it was time to hold the first “Big Data Lunch in the Park” summertime meet-up.

Register here at http://bigdatalunchinthepark.eventbrite.com 

Grab your lunch, divert your phone to your mobile and join us on the 8th August at noon at Green Park and hang out with some of your fellow Big Data enthusiasts.  …

We continue to make strong headway towards the general availability of Hadoop 2.0.  A release candidate for Hadoop 2.1.0- Beta is currently under consideration by the Apache community. This critical milestone signifies both the outstanding progress being made by the community and equally important, the stabilization of Hadoop 2.0 APIs.

A defining characteristic of Hadoop 2.0 is its next generation resource management framework called YARN.  YARN enables Hadoop to grow beyond its MapReduce origins to embrace multiple workloads spanning interactive queries, batch processing, streaming & more.…

Last week, we published a blog about the Hadoop job marketing and evolving your SQL skills to Hadoop. To help you with that evolution, we’re delighted to offer you some special pricing on training. What better way to stay cool this summer than to be in a nice air-conditioned classroom?

  • Looking for a class in the US?  Hortonworks – here’s 25% for any class offered by Hortonworks in August. Use this discount code: TCAMSUMMR25
  • Do you want to take a class in the UK?

My work on adding data types to HBase has come along far enough that ambiguities in the conversation are finally starting to shake out. These were issues I’d hoped to address through initial design documentation and a draft specification. Unfortunately, it’s not until there’s real code implemented that the finer points are addressed in concrete. I’d like to take a step back from the code for a moment to initiate the conversation again and hopefully clarify some points about how I’ve approached this new feature.…

One of the big opportunities that Hadoop provides is the processing power to unlock value in big datasets of varying types from the ‘old’ such as web clickstream and server logs, to the new such as sensor data and geolocation data.

The explosion of smart phones in the consumer space (and smart devices of all kinds more generally) has continued to accelerate the next generation of apps such as Foursquare and Uber which depend on the processing of and insight from huge volumes of incoming data.…

Hadoop jobs have grown 200,000%. No, that’s not a typo. According to Indeed.com, Hadoop is one of the top 10 job trends right now.

When you look at LinkedIn, the growth in profiles that have SQL in them is on the downswing — about -4%, but the growth of profiles that have Hadoop in them is up 37%. Hadoop is becoming a clear resume differentiator. Updating and maintaining technical skills has always been part of the job and is part of ensuring a long and healthy career.…

Whether only beginning or well underway with Big Data initiatives, organizations need data protection to mitigate risk of breach, assure global regulatory compliance and deliver the performance and scale to adapt to the fast-changing ecosystem of Apache Hadoop tools and technology.

Business insights from big data analytics promise major benefits to enterprises – but launch of these initiatives also presents potential risks. New architectures, including Hadoop, can aggregate different types of data in structured, semi-structured and unstructured forms, perform parallel computations on large datasets, and continuously feed the data lake that enable data scientists to see patterns and trends.…

Airline pricing has always been a mystery to me, a combination of art and science allowing the airline to make as much money as possible on each flight while providing the customer the options and flexibility they want. Under the covers I know there are complex models the airlines use to determine how many seats have been sold and how much they can get for the remaining seats. I didn’t realize how seriously complex the models were but more importantly, the opportunity available to the travel industry to become more customer-centric while staying competitive by harnessing the data now available to them.…

Today was our last day at the Worldwide Partner Conference (WPC) where 15,000+ people joined up for business sessions, networking, exhibits, heat, humidity, Lenny Kravitz and fantastic Houston Texas hospitality.  As a first time sponsor we thought we would share our views from the conference.

Steve Ballmer opened the conference talking about the Microsoft transformation to a devices-and-services company and the four trends underpinning that transformation – cloud, mobility, big data and enterprise social.…

By now, you’re probably well aware of what Hadoop does:  low-cost processing of huge amounts of data. But more importantly, what can Hadoop do for you?

We work with many customers across many industries with many different specific data challenges, but in talking to so many customers, we are also able to see patterns emerge on certain types of data and the value that could bring to a business.

We love to share these kinds of insights, so we built a series of video tutorials covering some of those scenarios:

Some more detailed discussion of these types of data is in our ‘Business Value of Hadoop’ whitepaper.…

BAM! What a week for Hadoop as we all spent time with around 2500 of our closest friends to spin some YARNs (I saw it over here first). Like me, you’re probably still digesting everything you heard but in the meantime here are some highlights from us.

Modern Data Architecture. Integrating Hadoop into existing data center investments is a hot topic for any enterprise thinking about Big Data. In support of that need there were some announcements with key data center partners:

Go to page:« First...10...2122232425...3040...Last »