The Hortonworks Blog

Please join Hortonworks and Appnovation for a webinar titled “Bigger Data on Your Budget” taking place on Wednesday, February 13th at 2pm EST, 11am PST.

Register Now

Appnovation is a new Hortonworks Systems Integrator partner that is focused on cutting edge open source technologies. They are experts in Drupal, Alfresco, SproutCore and now Apache Hadoop.

In advance of this webinar, I interviewed Dave Porter, Appnovation & SproutCore Lead Developer, about the technologies they support and how Appnovation and Hortonworks are working together to provide big insights without breaking the bank.…

The Hortonworks Sandbox was recently introduced garnering incredibly positive response and feedback. We are as excited as you, and gratified that our goal providing the fastest onramp to Apache Hadoop has come to fruition. By providing a free, integrated learning environment along with a personal Hadoop environment, we are helping you gain those big data skills faster. Because of your feedback and demand for new tutorials, we are accelerating the release schedule for upcoming tutorials.…

For this post, we take a technical deep-dive into one of the core areas of HBase. Specifically, we will look at how Apache HBase distributes load through regions, and manages region splitting. HBase stores rows of data in tables. Tables are split into chunks of rows called “regions”. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process. A region is a continuous range within the key space, meaning all rows in the table that sort between the region’s start key and end key are stored in the same region.…

The customer data that companies collect from websites, social media, blogs, digital advertising and mobile is exploding. And as big data gets bigger, the amount of untapped insights available from analyzing that day is also growing exponentially. Marketers covet those insights as a way to better understand and engage with their customers and ultimately drive revenue—but how do they get to it?

According to Gartner, organization that successfully integrate high-value, diverse new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20 percent.* Fortunately, a new solution that combines Hortonworks Data Platform (HDP) with the expertise of eSage Group allows marketing professionals to extract value from Big Data, quickly and with relative ease.…

Today we announced Hortonworks Data Platform certification for Rackspace Private Cloud. In fact, we are the only Apache Hadoop distribution certified with Rackspace Private Cloud. The result of combining the power of enterprise-class Apache Hadoop in Hortonworks Data Platform (HDP) with Rackspace Private Cloud, is that organizations now have a secure, scalable environment to refine, explore and enrich their data using Hadoop in the cloud. With HDP, data can be processed from applications that are hosted on Rackspace Private Cloud environments, allowing you to quickly and easily obtain additional business insights from this information.…

By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud.  Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.…

I recently delivered a webinar entitled “Hortonworks State of the Union”. For those new to Apache Hadoop, I covered a brief history of Hadoop and Hortonworks’ role within the open source community. We also covered how the platform services, data services, and operational services required to enable Hadoop as an enterprise-viable platform evolved in 2012.

Finally, we discussed the important progress made on deeply integrating Hadoop within next-generation data architectures in a way that makes sense for the enterprise.…

If Pig is the “duct tape for big data“, then DataFu is the WD-40. Or something.

No, seriously, DataFu is a collection of Pig UDFs for data analysis on Hadoop. DataFu includes routines for common statistics tasks (e.g., median, variance), PageRank, set operations, and bag operations.

It’s helpful to understand the history of the library. Over the years, we developed several routines that were used across LinkedIn and were thrown together into an internal package we affectionately called “littlepiggy.” The unfortunate part, and this is true of many such efforts, is that the UDFs were ill-documented, ill-organized, and easily got broken when someone made a change.…

Go from Zero to Big Data in 15 Minutes!

Today Hortonworks announced the availability of the Hortonworks Sandbox, an easy-to-use, flexible and comprehensive learning environment that will provide you with fastest on-ramp to learning and exploring enterprise Apache Hadoop.

The Hortonworks Sandbox is:

  • A free download
  • A complete, self contained virtual machine with Apache Hadoop pre-configured
  • A personal, portable and standalone Hadoop environment
  • A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop on your own

The Hortonworks Sandbox is designed to help close the gap between people wanting to learn and evaluate Hadoop, and the complexities of spinning up an evaluation cluster of Hadoop.…

Happy New Year, everyone!

I’m excited to kick-off our first webinar series for 2013: The True Value of Apache Hadoop.

Get all your friends, co-workers together and be prepared to geek out to Hadoop!

This 4-part series will have a mixture of amazing guest speakers covering topics such as Hortonworks 2013 vision and roadmaps for Apache Hadoop and Big Data, What’s new with Hortonworks Data Platform v1.2, How Luminar (an Entravision company) adopted Apache Hadoop, and use case on Hadoop, R and GoogleVis.…

When the term scientific computing comes up in a conversation it’s usually just the occasional science geek who shows signs of recognition. But although most people have little or no knowledge of the field’s existence, it has been around since the second half of the twentieth century and has played an increasingly important role in many technological and scientific developments. Internet search engines, DNA analysis, weather forecasting, seismic analysis, renewable energy, and aircraft modeling are just a small number of examples where scientific computing is nowadays indispensible.…

What:             “Hortonworks State of the Union and Vision for Apache Hadoop in 2013” webinar

Who:               Shaun Connolly, Vice President of Corporate Strategy, Hortonworks

When:             Tuesday, January 22, 2013 at 1:00 p.m. ET/10:00am PT

Where:           http://info.hortonworks.com/Winterwebinarseries_TheTrueValueofHadoop.html

Click to Tweet: #Hortonworks hosting “State of the Union” webinar to discuss 2013 vision for #Hadoop, 1/22 at 1 pm ET. Register here: http://bit.ly/VYJxKX

The “State of the Union” webinar is the first in a four-part Hortonworks webinar series titled, “The True Value of Apache Hadoop,” designed to inform attendees of key trends, future roadmaps, best practices and the tools necessary for the successful enterprise adoption of Apache Hadoop.…

Hortonworks Data Platform 1.2 is now available for download at: http://hortonworks.com/products/hortonworksdataplatform/.

Hortonworks Data Platform (HDP) 1.2, the industry’s only complete 100-percent open source platform powered by Apache Hadoop is available today. The enterprise-grade Hortonworks Data Platform includes the latest version of Apache Ambari for comprehensive management, monitoring and provisioning of Apache Hadoop clusters. By also introducing additional new capabilities for improving security and ease of use, HDP delivers an enterprise-class distribution of Apache Hadoop that is endorsed and adopted by some of the largest vendors in the IT ecosystem.…

We are pleased to announce the the release of Apache Hive version 0.10.0. More than 350 JIRA issues have been fixed with this release. A few of the most important fixes include:

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows.…

We are pleased to announce that Apache Pig 0.10.1 was recently released. This is primarily a maintenance release focused on stability and bug fixes. In fact, Pig 0.10.1 includes 42 new JIRA fixes since the Pig 0.10.0 release.

Some of the notable changes include:

  • Source code-only distribution

In the download section for Pig 10.0.1, you will now find a source-only tarball (pig-0.10.1-src.tar.gz) alongside the traditional full tarball, rpm and deb distributions.…

Go to page:« First...1020...2829303132...40...Last »