The Hortonworks Blog

This post is authored by Jian He with Vinod Kumar Vavilapalli and is the seventh post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

Apache Hadoop 2 is in beta now .…

In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:

  • What is the most efficient path for a site visitor to research a product, and then buy it?
  • What products do visitors tend to buy together, and what are they most likely to buy in the future?

Historical data is now an essential tool for businesses as they struggle to meet increasingly stringent regulatory requirements, manage risk and perform predictive analytics that help improve business decisions. And while recent data may be available from an enterprise data warehouse, the traditional practice of archiving old data offsite on tape makes business analytics challenging, if not impossible, because the historical information needed is simply unavailable.

Fortunately, the modern approach to data storage business analytics utilizes technologies like virtualization and big data Hadoop clusters to enable partitioned access to historical data.…

Continuing our series of quick interviews with Apache Hadoop project committers and contributors at Hortonworks.

To follow on from yesterday’s Server Log processing with Apache Flume tutorial we talk with Roshan Naik, Hortonworks engineer and Apache Flume contributor, about what Flume is, how it works and where it’s going.

Learn more about Flume here or at the Apache Hadoop project site.

The best architecture diagrams are those that impart the intended knowledge with maximum efficiency and minimum ambiguity. But sometimes there’s a need to add a little pizazz, and maybe even draw a picture or two for those Powerpoint moments.

Download stencils for Omnigraffle and Visio, and the Hi Res PNG and EPS files here.

We’ve built a small set of Hadoop-related icons that might help you next time you need that picture focusing on the intended function of various components.…

When they’re not planning to overthrow their human overlords, most servers can be found spewing out vast amounts of data in the form of server logs. As we showed in our video - Deliver responsive IT from events in Server Logs - these logs contain a lot of value.

So if you fire up the Hortonworks Sandbox today, you’ll be delighted to find Tutorial 12: Refining and Visualizing Server Log Data as a step-by-step guide to the video. …

The shift to a data-oriented business is happening. The inherent value in established and emerging big datasets is becoming clear. Enterprises are building big data strategies to take advantage of these new opportunities and Hadoop is the platform to realize those strategies.

Hadoop is enabling a modern data architecture where it plays a central role: built to tackle big data sets with efficiency while integrating with existing data systems. As champions of Hadoop, our aim is to ensure the success of every Hadoop implementation and improve our own understanding of how and why enterprises tackle big data initiatives. …

Our Systems Integrator partner, Knowledgent, is hosting a Big Data Immersion Class geared towards technologists who are tasked with launching Big Data programs that must have tangible real-time benefits to their organizations.

“When and how do I use these new big data technologies?” “How do I operationalize them in my environment?” These are some of the fundamental questions that Knowledgent prospects and customers are asking and why the 3 day immersion class was developed.…

This post authored by Zhijie Shen with Vinod Kumar Vavilapalli.

This is the sixth blog in the multi-part series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

Introduction

The beta release of Apache Hadoop  2.x has finally arrived and we are striving hard to make the release easy to adopt with no or minimal pain to our existing users.…

Chances are you’ve already used Tableau Software if you’ve been involved with data analysis and visualization solutions for any length of time. Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Hadoop with Hortonworks Data Platform via Hive and the Hortonworks Hive ODBC driver.

If you want to get hands on with Tableau as quickly as possible, we recommend using the Hortonworks Sandbox and the ‘Visualize Data with Tableau’ tutorial.…

It’s my great pleasure to announce that the Apache Hadoop community has declared Hadoop 2.x as Beta with the vote closing over the weekend for the hadoop-2.1.0-beta release.

As noted in the announcement to the mailing lists, this is a significant milestone across multiple dimensions: not only is the release chock-full of significant features (see below), it also represents a very stable set of APIs and protocols on which we can continue to build for the future.…

As summer comes to a close, we bid a fond farewell (again!) to our excellent marketing intern, Tanya Maslyanko. Tanya has been a terrific help to us with her can-do attitude and marketing intuition so the tears we shed are because we’ll miss our friend and because we’ll have to start doing our own work again. Over to Tanya…

A few years ago, I sat in a freshman-filled auditorium at my university’s orientation listening to successful graduates talk about how important it was to get involved with your career early on.…

There are myriad of use cases for Big Data applications across industries. For example, financial companies want to analyze Governance to assess levels of risk and compliance.  Transportation companies want to analyze overall logistics for optimization.  Oil and Gas companies supplying energy want to predict machine failings to reduce risks of outages. Insurance companies will need to analyze actuarial information in order to calculate individual policy premiums – yes, the impending Affordable Care Act.…

The next in our series of quick interviews with Apache Hadoop project committers at Hortonworks.

In this video, we talk with Sanjay Radia, Hortonworks co-founder and Apache Hadoop committer, about the initiation of HDFS, the cost benefits it brings to data storage and future directions for the project.

Learn more about HDFS here or at the Apache Hadoop project site.

Before I was a developer of Hadoop, I was a user of Hadoop.  I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.

Have you ever wondered what’s happening during a namenode restart?  A new feature coming in HDP 2.0 will give operators greater visibility into this critical process.  This is a feature that would have been very useful to me in my prior role.…

Go to page:« First...10...1718192021...3040...Last »