Hadoop Insights

News about Hadoop in the wild; how Hadoop is being used; how Hadoop can be used.

Big Data Shopping Bag

With big data basking in the limelight, it is no surprise that large retailers have been closely watching its development… and more power to them! By learning to effectively utilize big data, retailers can significantly mold the market to their advantage, making themselves more competitive and increasing the likelihood that they will come out on top as a successful retailer. Now that there are open source analytical platforms like Hadoop, which allow for unstructured data to be transformed and organized, large retailers are able to make smart business decisions using the information they collect about customers’ habits, preferences, and needs.…

Big data. These are two words the world has been hearing a lot lately and it has been in relevance to a wide array of use cases in social media, government regulation, auto insurance, retail targeting, etc. The list goes on. However, a very important concept that should receive the same (if not more) recognition is the presence of big data in human genome research.

Three billion base pairs make up the DNA present in humans.…

Series Introduction

This is part one of a series of blog posts covering new developments in the Hadoop pantheon that enable productivity throughout the lifecycle of big data.  In a series of posts, we’re going to explore the full lifecycle of data in the enterprise: Introducing new data sources to the Hadoop filesystem via ETL, processing this data in data-flows with Pig and Python to expose new and interesting properties, consuming this data as an analyst in HIVE, and discovering and accessing these resources as analysts and application developers using HCatalog and Templeton.…

Since joining Hortonworks at the beginning of the year, a question I’ve heard over and over again is “What is Apache Hadoop and what is it used for?”

There’s clearly a lot of hype [and confusion] in this emerging Big Data market, and it feels as if each new technology, as well as existing technologies, are pushing the meme of “all your data are belong to us”. It is great to see the wave of innovation occurring across the landscape of SQL, NoSQL, NewSQL, EDW, MPP DBMS, Data Marts, and Apache Hadoop (to name just a few), but enterprises and the market in general can use a healthy dose of clarity on just how to use and interconnect these various technologies in ways that benefit the business.…

I attended the Goldman Sachs Cloud Conference and participated on a panel focused on “Data: The New Competitive Advantage”. The panel covered a wide range of questions, but kicked off covering two basic questions:

“What is Big Data?” and “What are the drivers behind the Big Data market?”

While most definitions of Big Data focus on the new forms of unstructured data flowing through businesses with new levels of “volume, velocity, variety, and complexity”, I tend to answer the question using a simple equation:

Big Data = Transactions + Interactions + Observations

The following graphic illustrates what I mean:

I recently joined Hortonworks as VP of Corporate Strategy, and I wanted to share my thoughts as to what attracted me to Hortonworks.

For me, it’s important to 1) work with a top-notch team and 2) focus on unique market-changing business opportunities.

Hortonworks has a strong team of technical founders (Eric14, Alan, Arun, Deveraj, Mahadev, Owen, Sanjay, and Suresh) doing impressive work within the Apache Hadoop community. Hortonworks also has an impressive Board of Directors that includes folks like Peter Fenton, Mike Volpi, Jay Rossiter, Rob Bearden, as well as our most recent board member Paul Cormier (Red Hat’s President of Products and Technology).…

Hi Folks,

I’m talking at a storage conference this month and I’d like to see if crowdsourcing will generate interesting examples and studies that I can include in my presentation.

What I’d like is interesting cases where HDFS has been compared to other storage technologies. Especially interested in cases where the decision was made to deploy HDFS rather than to buy an alternative technology.  Also interested in any large deployments where HDFS is being used for interesting things beyond being the serving layer for MapReduce and HBase.  …