The Hortonworks Blog

Posts categorized by : Apache Hadoop

Before I was a developer of Hadoop, I was a user of Hadoop.  I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.

Have you ever wondered what’s happening during a namenode restart?  A new feature coming in HDP 2.0 will give operators greater visibility into this critical process.  This is a feature that would have been very useful to me in my prior role.…

UPDATE: This cheat sheet was so popular, we’ve created a PDF of the content below so you can print it and use it more easily. Download here.

 

If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.

From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started.…

If you want to understand the thinking in the various projects in the Hadoop ecosystem, then who better to talk to than key members of those projects – the committers.

In this video, we talk with Owen O’Malley, Hortonworks co-founder and Apache Hive committer, about the initiation of Hive, why it matters and future directions for the project.

Learn more about Hive here, or at the Apache Hive project site.…

A busy week at Hortonworks Towers means a quick recap on what’s been happening.

Hadoop on Windows. On Tuesday we announced the GA of HDP 1.3 for Windows. Apart from being the only native Windows distribution for Hadoop, the updates and innovation in this release bring it to parity with our Linux distribution which means Hadoop Everywhere! Later on, we talked about getting started with HDP 1.3 for Windows, and also pointed at some great resources and tutorials.…

This guest post from Sofia Parfenovich, Data Scientist at Altoros Systems, a big data specialist and a Hortonworks System Integrator partner. Sofia explains she optimized a customer’s trading solution by using Hadoop (Hortonworks Data Platform) and by clustering stock data.

Automated trading solutions are widely used by investors, banks, funds, and other stock market players. These systems are based on complex mathematical algorithms and can take into account hundreds of factors.…

In this blog we’ll set up NFS for HDFS access with the Hortonworks Sandbox 1.3. This allows the reading and writing of files to Hadoop using familiar methods to desktop users. Sandbox is a great way to understand this particular type of access.

If you don’t have it already, then download the sandbox here. Got the download? Then let’s get started.

Start the Sandbox. Get to this screen.

We will now enable Ambari so that we can edit the configuration to enable NFS.…

Today we released the Hortonworks Data Platform 1.3 for Windows for Windows Server 2008 R2 and 2012. This is an exciting major update to the only Enterprise Hadoop distribution on Windows. In this blog post, I will discuss what’s new and how to get started.

 Enabling new data applications

This release brings component parity to the HDP Stack across all operating systems by adding the following components:

  • Apache HBase (0.94.6.1) is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS).

Thanks to all who joined us for last week’s webinar on Apache Hadoop YARN: Enabling Next Generation Data Applications. You can listen to the full webinar replay here, and the slides are embedded below.

If you’re already diving into YARN, then we will be hosting the first  ’Office Hours’ sessions at Hortonworks HQ. Join us on August 15th for a Deep Dive on Hoya (HBase on YARN)

Office hours will give you a chance to talk with those Hortonworks developers deeply involved with YARN and Hoya projects as well as your peers just launching their YARN projects.  …

The Hortonworks Sandbox is a great tool for not only learning Hadoop, but also for experimentation and application development.  Deployment in a type 2 hypervisor such as Oracle VirtualBox or VMWare Workstation is straightforward and serves the need for a single user. Sandbox can also be deployed to IaaS environments, and in this case, we walk through the steps of deploying Hortonworks Sandbox on OpenStack. For the purposes of this article, the author has used OpenStack Grizzly release running QEMU-KVM as the underlying hypervisor.…

In the last Hoya article, we talked about the its Application Architecture. Now let’s talk persistence. A key use case for Hoya is:  support long-lived clusters that can be started and stopped on demand. This lets a user start and stop an HBase cluster when they want, only using CPU and memory resources when they actually need it. For example, a specific MR job could use a private HBase instance as part of its join operations, or for an intermediate store of results in a workflow.…

At Hadoop Summit in June, we introduced a little project we’re working on: Hoya: HBase on YARN. Since then the code has been reworked and is now up on Github. It’s still very raw, and requires some local builds of bits of Hadoop and HBase – but it is there for the interested.

In this article we’re going to look at the architecture, and a bit of the implementation.

We’re not going to look at YARN in this article -for that we have a dedicated section of the Hortonworks site -including sample chapters of Arun Murthy’s forthcoming book.…

If you’re considering the WHY, the HOW and the WHAT of Hadoop and Big Data in your business, then this collection of papers and ebooks is your friend.

  • WHY does Hadoop matter? Our eBook “Disruptive Possibilities of Big Data” paints a picture of the future of the data-driven business and how it changes everything.
  • HOW does Hadoop work in my data architecture? As part of a modern data architecture, Hadoop sits alongside existing infrastructure and augments its capabilities through Refining and Exploring big datasets and ultimately enriching the application and customer experiences for your business.

We continue to make strong headway towards the general availability of Hadoop 2.0.  A release candidate for Hadoop 2.1.0- Beta is currently under consideration by the Apache community. This critical milestone signifies both the outstanding progress being made by the community and equally important, the stabilization of Hadoop 2.0 APIs.

A defining characteristic of Hadoop 2.0 is its next generation resource management framework called YARN.  YARN enables Hadoop to grow beyond its MapReduce origins to embrace multiple workloads spanning interactive queries, batch processing, streaming & more.…

My work on adding data types to HBase has come along far enough that ambiguities in the conversation are finally starting to shake out. These were issues I’d hoped to address through initial design documentation and a draft specification. Unfortunately, it’s not until there’s real code implemented that the finer points are addressed in concrete. I’d like to take a step back from the code for a moment to initiate the conversation again and hopefully clarify some points about how I’ve approached this new feature.…

One of the big opportunities that Hadoop provides is the processing power to unlock value in big datasets of varying types from the ‘old’ such as web clickstream and server logs, to the new such as sensor data and geolocation data.

The explosion of smart phones in the consumer space (and smart devices of all kinds more generally) has continued to accelerate the next generation of apps such as Foursquare and Uber which depend on the processing of and insight from huge volumes of incoming data.…

Go to page:« First...89101112...20...Last »

Thank you for subscribing!