Hadoop Insights

News about Hadoop in the wild; how Hadoop is being used; how Hadoop can be used.

This article originally appeared at Opensource.com and is reproduced here.

There are rapidly growing feature set, high commit rates, and code contributions happening across the globe to Apache Hadoop and related Apache Software Foundation projects. However, the number of woman developerscommitters, and Project Management Committee (PMC) members in this vast and diversified ecosystem are really diminutive. For the Hadoop project alone, only 5% out of 84 committers are women; and this has been the case for over the past 2 years.…

This is the third in our series on modern data architectures across industry verticals. Others in the series are:

Many of the world’s largest telecommunications companies use Hortonworks Data Platform (HDP) to manage their data. Through partnership with these companies, we have learned how our customers use HDP to improve customer satisfaction, make better infrastructure investments and develop new products.…

The year is coming to its end. Maybe you’re reading this as you race to check a few more 2013 items off of your to-do list (at work or at home). Or maybe you’ve already got a hot toddy in your hand and your feet kicked up, with slippers warming your toes.

In 2013, I have been fortunate enough to spend the year speaking with our customers and I learned about how so many important organizations are using Apache Hadoop and Hortonworks Data Platform (HDP) to solve real problems.…

There is a lot of information available on the benefits of Apache YARN but how do you get started building applications? On December 18 at 9am Pacific Time, Hortonworks will host a webinar and go over just that:  what independent software vendors (ISVs) and developers need to do to take the first steps towards developing applications or integrating existing applications on YARN.

Register for the webinar here.

Why YARN?

As Hadoop gains momentum it’s important to recognize the benefits to customers and the competitive advantage software vendors will have if their application is integrated with YARN like elasticity, reliability and efficiency.…

In God we trust, all others must bring data. Dr. W. Edwards Deming Dr. W. Edwards Deming was a statistician and manufacturing consultant who worked on Japanese reconstruction after WWII. His quality control methods influenced innovative Japanese manufacturing processes that simultaneously increased volume, reduced cost, and improved quality. Near the end of his career, Deming taught the same lessons to U.S. automakers.

To this day, the “Deming Prize” is one of the highest rewards for Total Quality Management in the world.…

2013 was certainly a revealing year for the Enterprise Hadoop market. We witnessed the emergence of the YARN-based architecture of Hadoop 2 and a strong ecosystem embracement that will fuel its next big wave of innovation. The analyst community accurately predicted Hadoop’s market momentum would greatly accelerate, but none predicted a pure play vendor would publicly declare its intent to pivot away from the Enterprise Hadoop market. Interesting times indeed!

Join us on Tuesday January 21st where we’ll be covering the Enterprise Hadoop State of the Union in more detail.…

We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.

One key obstacle in the healthcare market is data liquidity (for patients, practitioners and payers) and some are using Apache Hadoop to overcome this challenge, as part of a modern data architecture.…

I teach for Hortonworks and in class just this week I was asked to provide an example of using the R statistics language with Hadoop and Hive. The good news was that it can easily be done. The even better news is that it is actually possible to use a variety of tools: Python, Ruby, shell scripts and R to perform distributed fault tolerant processing of your data on a Hadoop cluster.…

Using Hadoop as an enterprise data platform means great integration with other technologies in the data center.

To that end, the Hortonworks Sandbox Partner Gallery showcases how our partners’ solutions integrate with Hadoop and provide you with easy access to learn how to use those solutions with the Hortonworks Data Platform via the Sandbox.

Don’t have the Sandbox? Get your free download of this single node Hadoop environment that’s delivered as a Virtual Machine that you can run on your laptop.…

Now that Hortonworks Data Platform 2.0 is GA, you may be looking to migrate your Hadoop stack from another version to take advantage of Hadoop 2’s YARN-based architecture. Fortunately, our Professional Services & Support teams are getting a lot of practice at migration from other distributions as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform.

While any specific migration may have a few gotchas from a vendor lock-in, or business integration perspective, this high-level process overview is battle tested on large-scale production clusters and we hope it helps you plan for your own migration.…

Behind all the Big Data hype, there is one common thread: Apache Hadoop and its associated components ARE the technology platform of choice. And here at Hortonworks, that’s what we do: Hadoop.

That is also why we are so excited about the incredible growth in customers who have chosen to work with us to ensure their implementation of Hadoop and realize their vision of a modern data architecture.

Here are the key reasons we believe that we can best help your enterprise with Apache Hadoop.…

You did it! Last Sunday we challenged you to “Learn Hadoop in 7 days”. We hope that you have risen to the test and kept up with the tutorials we’ve posted each day through Twitter and Facebook. These tutorials should have helped you delve into:

By now, you should feel comfortable with Hadoop clickstream analysis, Hortonworks ODBC driver configuration, and many other important components of Hadoop.…

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform.…

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it.

In a previous post, I described my view on “What is a data scientist?”: it’s a hybrid role that combines the “applied scientist” with the “data engineer”. …

How big is big anyway? What sort of size and shape does a Hadoop cluster take?

These are great questions as you begin to plan a Hadoop implementation. Designing and sizing a cluster is complex and something our technical teams spend a lot of time working with customers on: from storage size to growth rates, from compression rates to cooling then there are many factors to take into account.

To make that a little more fun, we’ve built a cluster-size-o-tron which performs a more simplistic calculation based on some assumptions on node sizes and data payloads to give an indication of how big your particular big is.…

Go to page:12345