The Hortonworks Blog

If you haven’t yet noticed, we have made Hortonworks Data Platform v1.0 available for download from our website. Previously, Hortonworks Data Platform was only available for evaluation for members of the Technology Preview Program or via our Virtual Sandbox (hosted on Amazon Web Services). Moving forward and effective immediately, Hortonworks Data Platform is available to the general public.

Hortonworks Data Platform is a 100% open source data management platform, built on Apache Hadoop.…

By any measure, last week’s Hadoop Summit was a tremendous success. It brought together more than 2,200 people from throughout the Apache Hadoop ecosystem to share Hadoop knowledge, ideas, best practices, and interesting use cases. It was also a great chance for big data vendors to make announcements and demonstrate new and exciting solutions.

For those of you that missed the conference, or missed a particularly interesting presentation, we have some good news.…

The fifth annual Hadoop Summit drew to a close last week, with over 2200 Hadoopniks in attendance. While there were many innovations demonstrated, for me the best action was about Pig, HCatalog and Hive from Hortonworks and Twitter.

At the Hadoop Summit Pig Meetup, Twitter announced Ambrose, which now includes an excellent graph layout of Pig EXPLAIN data. This visualization can be used to debug and better understand your Pig scripts.…

I wanted to draw your attention to a Webinar taking place this Thursday at 1pm EDT, 10am PDT. “Back to the Future – MapReduce, Hadoop and the Data Scientist” will highlight the benefits of Apache Hadoop and the role that data scientists are playing in big data. The speakers include:

  • Colin White – Founder of BI Research, a leading research, education and consulting firm helping companies understand and benefit from evolving and leading edge technologies in the areas of business intelligence and data management.

I wanted to take this opportunity to share some important news. Today, Hortonworks announced version 1.0 of the Hortonworks Data Platform, a 100% open source data management platform based on Apache Hadoop. We believe strongly that Apache Hadoop, and therefore, Hortonworks Data Platform, will become the foundation for the next generation enterprise data architecture, helping companies to load, store, process, manage and ultimately benefit from the growing volume and variety of data entering into, and flowing throughout their organizations.…

The following press release was issued by Hortonworks today.

Hortonworks Announces General Availability of Hortonworks Data Platform

Industry’s First Apache Hadoop-based Platform to Include Management, Monitoring and Comprehensive Data Services, Making Hadoop Easy to Consume and Use in Enterprise Environments

Hadoop Summit is just around the corner and by that, I mean next week! There is still time to register for the conference but please do it soon as the conference is filling up quickly. Today is also the last day in which online registration will remain open. After today, you will need to register on-site at the conference itself.

This year’s Hadoop Summit conference, now in its fifth year, promises to be the biggest and best yet.…

Having worked at JBoss and Red Hat from 2004 to 2008 and SpringSource and VMware from 2008 to 2011, I’ve been focused on the world of open source software for a long while. I’ve been blessed to be able to serve enterprise customer needs with high quality open source software such as JBoss Application Server, Hibernate, Drools, Apache Web Server, Apache Tomcat, Spring … and now Apache Hadoop.

As specific open source technologies mature and their use becomes mainstream, it becomes increasingly important to understand and communicate the balancing act that needs to happen between community innovation and enterprise stability.…

Series Introduction

This is part two of a series of blog posts covering new developments in the Hadoop pantheon that enable productivity throughout the lifecycle of big data.  In a series of posts, we’re going to explore the full lifecycle of data in the enterprise: Introducing new data sources to the Hadoop filesystem via ETL, processing this data in data-flows with Pig and Python to expose new and interesting properties, consuming this data as an analyst in HIVE, and discovering and accessing these resources as analysts and application developers using HCatalog and Templeton.…

As the release manager for the Apache Hadoop 2.0 release, it gives me great pleasure to share that the Apache Hadoop community has just released Apache Hadoop 2.0.0 (alpha)! While only an alpha release (read: not ready to run in production), it is still an important step forward as it represents the very first release that delivers new and important capabilities, including:

Series Introduction

This is part one of a series of blog posts covering new developments in the Hadoop pantheon that enable productivity throughout the lifecycle of big data.  In a series of posts, we’re going to explore the full lifecycle of data in the enterprise: Introducing new data sources to the Hadoop filesystem via ETL, processing this data in data-flows with Pig and Python to expose new and interesting properties, consuming this data as an analyst in HIVE, and discovering and accessing these resources as analysts and application developers using HCatalog and Templeton.…

The latest video in the Hortonworks Executive Video Series features Sanjay Radia, Hortonworks co-founder and Apache Hadoop PMC member. Sanjay is well known in the HDFS circles, having contributed to Hadoop for the past 4+ years. In this video, Sanjay gives a good overview of HDFS, the primary storage system for Hadoop, and provides some insight into both the 0.23 release as well as what can be expected from HDFS over the rest of 2012.…

In case you didn’t see the news, I wanted to share the announcement that HCatalog 0.4.0 is now available.

For those of you that are new to the project, HCatalog provides a metadata and table management system that simplifies data sharing between Apache Hadoop and other enterprise data systems. You can learn more about the project on the Apache project site.

The highlights of the 0.4.0 release include:

- Full support for reading from and writing to Hive.…

Since joining Hortonworks at the beginning of the year, a question I’ve heard over and over again is “What is Apache Hadoop and what is it used for?”

There’s clearly a lot of hype [and confusion] in this emerging Big Data market, and it feels as if each new technology, as well as existing technologies, are pushing the meme of “all your data are belong to us”. It is great to see the wave of innovation occurring across the landscape of SQL, NoSQL, NewSQL, EDW, MPP DBMS, Data Marts, and Apache Hadoop (to name just a few), but enterprises and the market in general can use a healthy dose of clarity on just how to use and interconnect these various technologies in ways that benefit the business.…

I attended the Goldman Sachs Cloud Conference and participated on a panel focused on “Data: The New Competitive Advantage”. The panel covered a wide range of questions, but kicked off covering two basic questions:

“What is Big Data?” and “What are the drivers behind the Big Data market?”

While most definitions of Big Data focus on the new forms of unstructured data flowing through businesses with new levels of “volume, velocity, variety, and complexity”, I tend to answer the question using a simple equation:

Big Data = Transactions + Interactions + Observations

The following graphic illustrates what I mean:

Go to page:« First...102030...3738394041...Last »