The Hortonworks Blog

Think Big Analytics, a Hortonworks systems integration partner has been helping customers navigate the complex world of Hadoop successfully for the past three years.  Over the years they have seen it all and have developed one of the most mature Hadoop implementation methodologies known.  Recently, we asked Ron Bodkin, Founder and CEO of Think Big Analytics to share some insight.…

What are the “Must-Dos” Before Starting a Big Data Project?

The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance.

Higher Compression

ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for strings and bitmap encoding.

This focus on efficiency leads to some impressive compression ratios. This picture shows the sizes of the TPC-DS dataset at Scale 500 in various encodings.…

The Stinger Initiative is Hortonworks’ community-facing roadmap laying out the investments Hortonworks is making to improve Hive performance 100x and evolve Hive to SQL compliance to simplify migrating SQL workloads to Hive.

We launched the Stinger Initiative along with Apache Tez to evolve Hadoop beyond its MapReduce roots into a data processing platform that satisfies the need for both interactive query AND petabyte scale processing. We believe it’s more feasible to evolve Hadoop to cover interactive needs rather than move traditional architectures into the era of big data.…

We hosted a webinar on YARN a couple of weeks ago (see the slides and playback here). As you might expect, there was a lot of great questions and here is a set of answers to those questions.

Our next YARN-oriented Office Hours online on Sept 11th at 2pm PST. Join us on Meetup!

Who is using YARN and what benefits have they received from it?

On great public example of in production use of YARN, is at Yahoo!.…

This guest post from John Haddad, Director of Product Marketing at Informatica Corporation. He has over 25 years’ experience designing, building, integrating and marketing enterprise applications. His current focus is helping organizations get the most business value from Big Data by delivering timely, trusted, and relevant data across the extended enterprise.

Why is it so important for companies today to adopt a modern data architecture and why is next generation data integration on Apache Hadoop such a critical component?…

Another week, another release…  Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.

For our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap. …

If you’re heading back to work today after a long hot summer then here’s some notes on last week here at Hortonworks.

Building a modern data architecture. We kicked off the week with some discussion on what it means to implement Hadoop alongside existing data architecture components. Jim covered 3 essential requirements: integration with existing systems, reuse of existing skills, enterprise requirements such as reliability and availability. We also held the first webinar in our series on implementing Hadoop in the enterprise: this one was with Teradata.…

This post is authored by Jian He with Vinod Kumar Vavilapalli and is the seventh post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

Apache Hadoop 2 is in beta now .…

In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:

  • What is the most efficient path for a site visitor to research a product, and then buy it?
  • What products do visitors tend to buy together, and what are they most likely to buy in the future?

Historical data is now an essential tool for businesses as they struggle to meet increasingly stringent regulatory requirements, manage risk and perform predictive analytics that help improve business decisions. And while recent data may be available from an enterprise data warehouse, the traditional practice of archiving old data offsite on tape makes business analytics challenging, if not impossible, because the historical information needed is simply unavailable.

Fortunately, the modern approach to data storage business analytics utilizes technologies like virtualization and big data Hadoop clusters to enable partitioned access to historical data.…

Continuing our series of quick interviews with Apache Hadoop project committers and contributors at Hortonworks.

To follow on from yesterday’s Server Log processing with Apache Flume tutorial we talk with Roshan Naik, Hortonworks engineer and Apache Flume contributor, about what Flume is, how it works and where it’s going.

Learn more about Flume here or at the Apache Hadoop project site.

The best architecture diagrams are those that impart the intended knowledge with maximum efficiency and minimum ambiguity. But sometimes there’s a need to add a little pizazz, and maybe even draw a picture or two for those Powerpoint moments.

Download stencils for Omnigraffle and Visio, and the Hi Res PNG and EPS files here.

We’ve built a small set of Hadoop-related icons that might help you next time you need that picture focusing on the intended function of various components.…

When they’re not planning to overthrow their human overlords, most servers can be found spewing out vast amounts of data in the form of server logs. As we showed in our video - Deliver responsive IT from events in Server Logs - these logs contain a lot of value.

So if you fire up the Hortonworks Sandbox today, you’ll be delighted to find Tutorial 12: Refining and Visualizing Server Log Data as a step-by-step guide to the video. …

The shift to a data-oriented business is happening. The inherent value in established and emerging big datasets is becoming clear. Enterprises are building big data strategies to take advantage of these new opportunities and Hadoop is the platform to realize those strategies.

Hadoop is enabling a modern data architecture where it plays a central role: built to tackle big data sets with efficiency while integrating with existing data systems. As champions of Hadoop, our aim is to ensure the success of every Hadoop implementation and improve our own understanding of how and why enterprises tackle big data initiatives. …

Our Systems Integrator partner, Knowledgent, is hosting a Big Data Immersion Class geared towards technologists who are tasked with launching Big Data programs that must have tangible real-time benefits to their organizations.

“When and how do I use these new big data technologies?” “How do I operationalize them in my environment?” These are some of the fundamental questions that Knowledgent prospects and customers are asking and why the 3 day immersion class was developed.…

Go to page:« First...1112131415...2030...Last »

Thank you for subscribing!