The Windows Explorer experience for HDFS

How we use HDFS Explorer to manage files, and clean up after ourselves.

This guest post from Simon Elliston Ball, Head of Big Data at Red Gate and all round top bloke. 

Hadoop is a great place to keep a lot of data. The data-lake, the data-hub and the data platform;  it’s all about the data. So how do you manage that data? How do you get data in? How do you get results out? How do you get at the logs buried somewhere deep in HDFS?

At Red Gate we have been working on some query tools for Hadoop for a while and while testing we found ourselves endlessly typing hadoop fs. Getting data sets from our Windows desktops, to the cluster, or inspecting job output files was just taking too many steps. It should be as easy for us to access files on HDFS as files on my local drive. So we created HDFS Explorer, which works just like Windows Explorer, but connects to the WebHDFS APIs so we can browse files on our clusters.

HDFS Explorer helps if you’re shunting smaller data sets, or results to and from your desktop, but we also found it worked great for clearing up after all those test queries. Every job submission you send will leave a trail of meta-data, logs, job files, and output, which can quickly add up to a decent amount of disk. If you’re experimenting with a sandbox implementation this can be a real issue. On a proper cluster, even if disk space is not a problem, the mess left behind can make it hard to get to the right job’s diagnostics.

This video shows how to get up and running with HDFS Explorer and Hortonworks Sandbox so you can manage files and even clean up after yourself.

From its beginning as a humble little test tool, we’ve found HDFS Explorer has opened up the Hadoop File System, and made it much easier for us to implement proper data file management. This week we’re also happy to announce that we’ve added Kerberos support, making it ready for use on clusters in an Enterprise Authentication environment.

HDFS Explorer is available for free from the Red Gate Big Data site.

Download the Hortonworks Sandbox and get started with these great tutorials.

Categorized by :
Administrator Architect & CIO Data Analyst & Scientist Developer HDFS Sandbox

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.

Thank you for subscribing!