By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud. Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.…
The Hortonworks Blog
I recently delivered a webinar entitled “Hortonworks State of the Union”. For those new to Apache Hadoop, I covered a brief history of Hadoop and Hortonworks’ role within the open source community. We also covered how the platform services, data services, and operational services required to enable Hadoop as an enterprise-viable platform evolved in 2012.
Finally, we discussed the important progress made on deeply integrating Hadoop within next-generation data architectures in a way that makes sense for the enterprise.…
If Pig is the “duct tape for big data“, then DataFu is the WD-40. Or something.
No, seriously, DataFu is a collection of Pig UDFs for data analysis on Hadoop. DataFu includes routines for common statistics tasks (e.g., median, variance), PageRank, set operations, and bag operations.
It’s helpful to understand the history of the library. Over the years, we developed several routines that were used across LinkedIn and were thrown together into an internal package we affectionately called “littlepiggy.” The unfortunate part, and this is true of many such efforts, is that the UDFs were ill-documented, ill-organized, and easily got broken when someone made a change.…
Today Hortonworks announced the availability of the Hortonworks Sandbox, an easy-to-use, flexible and comprehensive learning environment that will provide you with fastest on-ramp to learning and exploring enterprise Apache Hadoop.
The Hortonworks Sandbox is:
- A free download
- A complete, self contained virtual machine with Apache Hadoop pre-configured
- A personal, portable and standalone Hadoop environment
- A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop on your own
The Hortonworks Sandbox is designed to help close the gap between people wanting to learn and evaluate Hadoop, and the complexities of spinning up an evaluation cluster of Hadoop.…
Happy New Year, everyone!
I’m excited to kick-off our first webinar series for 2013: The True Value of Apache Hadoop.
Get all your friends, co-workers together and be prepared to geek out to Hadoop!
This 4-part series will have a mixture of amazing guest speakers covering topics such as Hortonworks 2013 vision and roadmaps for Apache Hadoop and Big Data, What’s new with Hortonworks Data Platform v1.2, How Luminar (an Entravision company) adopted Apache Hadoop, and use case on Hadoop, R and GoogleVis.…
When the term scientific computing comes up in a conversation it’s usually just the occasional science geek who shows signs of recognition. But although most people have little or no knowledge of the field’s existence, it has been around since the second half of the twentieth century and has played an increasingly important role in many technological and scientific developments. Internet search engines, DNA analysis, weather forecasting, seismic analysis, renewable energy, and aircraft modeling are just a small number of examples where scientific computing is nowadays indispensible.…
What: “Hortonworks State of the Union and Vision for Apache Hadoop in 2013” webinar
Who: Shaun Connolly, Vice President of Corporate Strategy, Hortonworks
When: Tuesday, January 22, 2013 at 1:00 p.m. ET/10:00am PT
Click to Tweet: #Hortonworks hosting “State of the Union” webinar to discuss 2013 vision for #Hadoop, 1/22 at 1 pm ET. Register here: http://bit.ly/VYJxKX
The “State of the Union” webinar is the first in a four-part Hortonworks webinar series titled, “The True Value of Apache Hadoop,” designed to inform attendees of key trends, future roadmaps, best practices and the tools necessary for the successful enterprise adoption of Apache Hadoop.…
Hortonworks Data Platform 1.2 is now available for download at: http://hortonworks.com/products/hortonworksdataplatform/.
Hortonworks Data Platform (HDP) 1.2, the industry’s only complete 100-percent open source platform powered by Apache Hadoop is available today. The enterprise-grade Hortonworks Data Platform includes the latest version of Apache Ambari for comprehensive management, monitoring and provisioning of Apache Hadoop clusters. By also introducing additional new capabilities for improving security and ease of use, HDP delivers an enterprise-class distribution of Apache Hadoop that is endorsed and adopted by some of the largest vendors in the IT ecosystem.…
We are pleased to announce the the release of Apache Hive version 0.10.0. More than 350 JIRA issues have been fixed with this release. A few of the most important fixes include:
Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!
List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!
Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows.…
We are pleased to announce that Apache Pig 0.10.1 was recently released. This is primarily a maintenance release focused on stability and bug fixes. In fact, Pig 0.10.1 includes 42 new JIRA fixes since the Pig 0.10.0 release.
Some of the notable changes include:
- Source code-only distribution
In the download section for Pig 10.0.1, you will now find a source-only tarball (pig-0.10.1-src.tar.gz) alongside the traditional full tarball, rpm and deb distributions.…
When Russell Jurney and I first teamed up to write these posts we wanted to do something that no one had done before to demonstrate the power of Big Data, the simplicity of Pig and the kind of Big Data Security Analytics we perform at Packetloop.…
In a recent blog post, Hortonworks’ Steve Loughran discussed Apache Hadoop’s preference for JBOD-configured storage vs. the allure of RAID-0. As more enterprises are beginning to move beyond the science experiment stage and begin deploying Hadoop into their production environments, they are learning that Hadoop is quite different than other services in their data centers, such as web, mail, and database servers.They are learning that to achieve optimal performance, you need to pay particular attention to configuring the underlying hardware.…
Hadoop Summit North America 2013, the premier Apache Hadoop community event, will take place at the San Jose Convention Center, June 26-27, 2013. Hosted by Hortonworks, a leading contributor to Apache Hadoop, and Yahoo!, Hadoop Summit brings together the community of developers, architects, administrators, data analysts, data scientists and vendors interested in advancing, extending and implementing Apache Hadoop as the next-generation enterprise data platform.
This 6th Annual Hadoop Summit North America will feature seven tracks and more than 80 sessions focused on building, managing and operating Apache Hadoop from some of the most influential speakers in the industry.…
At Thanksgiving we took a moment to reflect on the past and give thanks for all that has happened to Hortonworks the past year. With the New Year approaching we now take time to look forward and provide our predictions for the Hadoop community in 2013. To compile this list, we queried and collected big data from our team of Hadoop committers and members of the community.
We asked a few luminaries as well and we surfaced many expert opinions and while we had our hearts set on five predictions, we ended up with SEVEN. …
This blog is a follow up on our previous blog “Snapshots for HDFS”
In June we had posted an early prototype of snapshots that allowed us to experiment with a few ideas in HDFS-2802. Since then we have added more details to the design document and made significant progress on a brand new implementation (over 40 subtasks in HDFS-2802).
Some of the highlights of this new design include:
- Read-Only Copy-on-Write (COW) snapshots (but can be extended RW later)
- Snapshots for entire namespace or sub directories
- Snapshots are managed by Admin, but users are allowed to take snapshots
- Snapshots are efficient
- Creation is instantaneous with O(1) cost.