This guest post from Simon Elliston Ball, Head of Big Data at Red Gate and all round top bloke.
Hadoop is a great place to keep a lot of data. The data-lake, the data-hub and the data platform; it’s all about the data. So how do you manage that data? How do you get data in? How do you get results out? How do you get at the logs buried somewhere deep in HDFS?
At Red Gate we have been working on some query tools for Hadoop for a while and while testing we found ourselves endlessly typing
hadoop fs. Getting data sets from our Windows desktops, to the cluster, or inspecting job output files was just taking too many steps. It should be as easy for us to access files on HDFS as files on my local drive. So we created HDFS Explorer, which works just like Windows Explorer, but connects to the WebHDFS APIs so we can browse files on our clusters.
HDFS Explorer helps if you’re shunting smaller data sets, or results to and from your desktop, but we also found it worked great for clearing up after all those test queries. Every job submission you send will leave a trail of meta-data, logs, job files, and output, which can quickly add up to a decent amount of disk. If you’re experimenting with a sandbox implementation this can be a real issue. On a proper cluster, even if disk space is not a problem, the mess left behind can make it hard to get to the right job’s diagnostics.
This video shows how to get up and running with HDFS Explorer and Hortonworks Sandbox so you can manage files and even clean up after yourself.
From its beginning as a humble little test tool, we’ve found HDFS Explorer has opened up the Hadoop File System, and made it much easier for us to implement proper data file management. This week we’re also happy to announce that we’ve added Kerberos support, making it ready for use on clusters in an Enterprise Authentication environment.
HDFS Explorer is available for free from the Red Gate Big Data site.