The Windows Explorer experience for HDFS

How we use HDFS Explorer to manage files, and clean up after ourselves.

This guest post from Simon Elliston Ball, Head of Big Data at Red Gate and all round top bloke. 

Hadoop is a great place to keep a lot of data. The data-lake, the data-hub and the data platform;  it’s all about the data. So how do you manage that data? How do you get data in? How do you get results out? How do you get at the logs buried somewhere deep in HDFS?

At Red Gate we have been working on some query tools for Hadoop for a while and while testing we found ourselves endlessly typing hadoop fs. Getting data sets from our Windows desktops, to the cluster, or inspecting job output files was just taking too many steps. It should be as easy for us to access files on HDFS as files on my local drive. So we created HDFS Explorer, which works just like Windows Explorer, but connects to the WebHDFS APIs so we can browse files on our clusters.

HDFS Explorer helps if you’re shunting smaller data sets, or results to and from your desktop, but we also found it worked great for clearing up after all those test queries. Every job submission you send will leave a trail of meta-data, logs, job files, and output, which can quickly add up to a decent amount of disk. If you’re experimenting with a sandbox implementation this can be a real issue. On a proper cluster, even if disk space is not a problem, the mess left behind can make it hard to get to the right job’s diagnostics.

This video shows how to get up and running with HDFS Explorer and Hortonworks Sandbox so you can manage files and even clean up after yourself.

From its beginning as a humble little test tool, we’ve found HDFS Explorer has opened up the Hadoop File System, and made it much easier for us to implement proper data file management. This week we’re also happy to announce that we’ve added Kerberos support, making it ready for use on clusters in an Enterprise Authentication environment.

HDFS Explorer is available for free from the Red Gate Big Data site.

Download the Hortonworks Sandbox and get started with these great tutorials.

Categorized by :
HDFS Sandbox


Robert Vecchione
July 2, 2014 at 2:25 pm

How do I browse HDFS with the WINDOWS version of HDP?

December 2, 2015 at 7:00 am

What happened to HDFS Explorer? I can’t find it anywhere on Redgate’s site.

January 4, 2016 at 5:21 am

The url link are not working anymore! Can you update?

January 25, 2016 at 9:20 pm

Please update the url

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.