The Hortonworks Blog

We are excited to announce that the call for abstracts for Hadoop Summit Europe 2014 (April 2-3, 2014) is now open and closes on October 31st. One of the new things for this year are updated tracks providing attendees with new options.  Last year was a wildly successful event  and we received a lot of feedback on how to make things better… and we listened.

Providing high value content is what the conference is all about and we received some great suggestions from the community on how to improve the sessions.   …

Syncsort, a technology partner with Hortonworks, helps organizations propel Hadoop projects with a tool that makes it easy to “Collect, Process and Distribute” data with Hadoop. This process, often called ETL (Exchange, Transform, Load), is one of the key drivers for Hadoop initiatives; but why is this technology a key enabler of Hadoop? To find out the answer we talked with Syncsort’s Director Of Strategy, Steve Totman, a 15 year veteran of data integration and warehousing, provided his perspective on Data Warehouse Staging Areas.…

He loves me, he loves me not… using daisies to figure out someone’s feelings is so last century. A much better way to determine whether someone likes you, your product or your company is to do some analysis on Twitter feeds to get better data on what the public is saying. But how do you take thousands of tweets and process them?  We show you how in our video – Understand your customers’ sentiments with Social Media Data – that you can capture a Twitter stream to do Sentiment Analysis.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Venkat Ranganathan discusses using Apache Sqoop for bulk data movement between Hadoop and enterprise data stores. Sqoop can also move data the other way, from Hadoop into an EDW.

Venkat is a Hortonworks engineer and Apache Sqoop committer who wrote the connector between Sqoop and the Netezza data warehousing platform. He also worked with colleagues at Hortonworks and in the Apache community to improve integration between Sqoop and Apache HCatalog, delivered in Sqoop 1.4.4.…

If you are an enterprise, chances are you use SAP.  And you are also more than likely using – or planning to use – Hadoop in your data architecture.

Today, we are delighted to announce the next step in our strategic relationship with SAP as they announce a reseller agreement with Hortonworks.  Under this agreement, SAP will resell Hortonworks Data Platform and provide enterprise support for their global customer base.  This will enable SAP customers to implement a data architecture that includes SAP HANA and the Hortonworks Data Platform and in so doing leverage existing skills to take advantage of the massive scalability and performance offered by Apache Hadoop.…

This post is the first in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

In this post we introduce the motivation behind Apache Tez (http://incubator.apache.org/projects/tez.html) and provide some background around the basic design principles for the project.…

As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment.…

Building a modern data architecture with Hadoop delivering high-scale and low-cost data processing means integrating Hadoop effectively inside the data center. For this post, we asked Yves de Montcheuil, VP of Marketing at Talend about his customers’ experiences with Hadoop integration. Here’s what he had to say:

Most organizations are still in the early stages of big data adoption, and few have thought beyond the technology angle of how big data will profoundly impact their processes and their information architecture.…

Think Big Analytics, a Hortonworks systems integration partner has been helping customers navigate the complex world of Hadoop successfully for the past three years.  Over the years they have seen it all and have developed one of the most mature Hadoop implementation methodologies known.  Recently, we asked Ron Bodkin, Founder and CEO of Think Big Analytics to share some insight.

What are the “Must-Dos” Before Starting a Big Data Project?…

The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance.

Higher Compression

ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for strings and bitmap encoding.

This focus on efficiency leads to some impressive compression ratios. This picture shows the sizes of the TPC-DS dataset at Scale 500 in various encodings.…

The Stinger Initiative is Hortonworks’ community-facing roadmap laying out the investments Hortonworks is making to improve Hive performance 100x and evolve Hive to SQL compliance to simplify migrating SQL workloads to Hive.

We launched the Stinger Initiative along with Apache Tez to evolve Hadoop beyond its MapReduce roots into a data processing platform that satisfies the need for both interactive query AND petabyte scale processing. We believe it’s more feasible to evolve Hadoop to cover interactive needs rather than move traditional architectures into the era of big data.…

We hosted a webinar on YARN a couple of weeks ago (see the slides and playback here). As you might expect, there was a lot of great questions and here is a set of answers to those questions.

Our next YARN-oriented Office Hours online on Sept 11th at 2pm PST. Join us on Meetup!

Who is using YARN and what benefits have they received from it?

On great public example of in production use of YARN, is at Yahoo!.…

This guest post from John Haddad, Director of Product Marketing at Informatica Corporation. He has over 25 years’ experience designing, building, integrating and marketing enterprise applications. His current focus is helping organizations get the most business value from Big Data by delivering timely, trusted, and relevant data across the extended enterprise.

Why is it so important for companies today to adopt a modern data architecture and why is next generation data integration on Apache Hadoop such a critical component?…

Another week, another release…  Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.

For our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap. …

If you’re heading back to work today after a long hot summer then here’s some notes on last week here at Hortonworks.

Building a modern data architecture. We kicked off the week with some discussion on what it means to implement Hadoop alongside existing data architecture components. Jim covered 3 essential requirements: integration with existing systems, reuse of existing skills, enterprise requirements such as reliability and availability. We also held the first webinar in our series on implementing Hadoop in the enterprise: this one was with Teradata.…

Go to page:« First...10...1819202122...3040...Last »