Using Hive to interact with HBase, Part 1

This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is here.

One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself. This is a topic that we did not get to cover in HBase in Action, perhaps these notes will become the basis for the 2nd edition ;) These notes are applicable to Hive 0.11.x used in conjunction with HBase 0.94.x. They should be largely applicable to 0.12.x + 0.96.x, though I haven’t tested everything yet.

The hive project includes an optional library for interacting with HBase. This is where the bridge layer between the two systems is implemented. The primary interface you use when accessing HBase from Hive queries is called the  BaseStorageHandler. You can also interact with HBase tables directly via Input and Output formats, but the handler is simpler and works for most uses.

HBase tables from Hive

Use the HBaseStorageHandler to register HBase tables with the Hive metastore. You can optionally specify the HBase table as EXTERNAL, in which case Hive will not create to drop that table directly – you’ll have to use the HBase shell to do so.

CREATE [EXTERNAL] TABLE foo(...)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
TBLPROPERTIES ('hbase.table.name' = 'bar');

The above statement registers the HBase table named bar in the Hive metastore, accessible from Hive by the name foo.

Under the hood, HBaseStorageHandler is delegating interaction with the HBase table to
HiveHBaseTableInputFormat and HiveHBaseTableOutputFormat. You can register your HBase table in Hive using those classes directly if you desire. The above statement is roughly equivalent to:

CREATE TABLE foo(...)
STORED AS
   INPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat'
TBLPROPERTIES ('hbase.table.name' = 'bar');

Also provided is the HiveHFileOutputFormat which means it should be possible to generate HFiles for bulkloading from Hive as well. In practice, I haven’t gotten this to work end-to-end (see HIVE-4627).

Schema mapping

Registering the table is only the first step. As part of that registration, you also need to specify a column mapping. This is how you link Hive column names to the HBase table’s rowkey and columns. Do so using the hbase.columns.mapping SerDe property.

CREATE TABLE foo(rowkey STRING, a STRING, b STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
TBLPROPERTIES ('hbase.table.name' = 'bar');
...

The values provided in the mapping property correspond one-for-one with column names of the hive table. HBase column names are fully qualified by column family, and you use the special token :key to represent the rowkey. The above

example makes rows from the HBase table bar available via the Hive table foo. The foo column rowkey maps to the HBase’s table’s rowkey, a to c1 in the f column family, and b to c2, also in the f family.

You can also associate Hive’s MAP data structures to HBase column families. In this case, only the STRING Hive type is used. The other Hive type currently supported is BINARY. See the wiki page for more examples.

Interacting with data

With the column mappings defined, you can now access HBase data just like you would any other Hive data. Only simple query predicates are currently supported.

SELECT * FROM foo WHERE ...;

You can also populate and HBase table using Hive. This works with both INTO and OVERWRITE clauses.

FROM source_hive_table INSERT INTO TABLE my_hbase_table
SELECT source_hive_table.* WHERE ...;

Be advised that there is a regression in Hive 0.12.0 which breaks this feature, see HIVE-5515.

In practice

There’s still a little finesse required to get everything wired up properly at runtime. The HBase interaction module is completely optional, so you have to make sure it and it’s HBase dependencies are available on Hive’s classpath.

$ export HADOOP_CLASSPATH=...
$ hive -e "CREATE TABLE ... STORED BY 'org.apache...HBaseStorageHandler'"

The installation environment could do a better job of handling this for users, but for the time being you must manage it yourself. Ideally the hive bin script can detect the presence of HBase and automatically make the necessary CLASSPATH adjustments. This enhancement appears to be tracked in HIVE-2055. The last mile is provided by the distribution itself, ensuring the environment variables are set for hive. This functionality is provided by BIGTOP-955.

You also need to make sure the necessary jars are shipped out to the MapReduce jobs when you execute your Hive statements. Hive provides a mechanism for shipping additional job dependencies via the auxjars feature.

$ export HIVE_AUX_JARS_PATH=...
$ hive -e "SELECT * FROM ..."

I did discover a small bug in HDP-1.3 builds which masks user-specified values of HIVE_AUX_JARS_PATH. With administrative rights, this is easily fixed by correcting the line in hive-env.sh to respect an existing value. The
work-around in user scripts is to use the SET statement to provide a value once you’ve launched the Hive CLI.

SET hive.aux.jars.path = ...

Hive should be able to detect which jars are necessary and add them itself. HBase provides the  TableMapReduceUtils#addDependencyJars methods for this purpose. It appears that this is done in hive-0.12.0, at least according to HIVE-2379.

Future work

Much has been said about proper support for predicate pushdown (HIVE-1643, HIVE-2854, HIVE-3617,
HIVE-3684) and data type awareness (HIVE-1245HIVE-2599). These go hand-in-hand as predicate semantics are defined in terms of the types upon which they operate. More could be done to map Hive’s complex data types like Maps and Structs onto HBase column families as well (HIVE-3211). Support for HBase timestamps is a bit of a mess; they’re not made available to Hive applications with any level of granularity (HIVE-2828, HIVE-2306). The only interaction a user has is via storage handler setting for writing a custom timestamp with all operations.

From a performance perspective, there are things Hive can do today (ie, not dependent on data types) to take advantage of HBase. There’s also the possibility of an HBase-aware Hive to make use of HBase tables as intermediate storage location (HIVE-3565), facilitating map-side joins against dimension tables loaded into HBase. Hive could make use of HBase’s natural indexed structure (HIVE-3634, HIVE-3727), potentially saving huge scans. Currently, the user doesn’t have (any?) control over the scans which are executed. Configuration on a per-job, or at least per-table basis should be enabled (HIVE-1233). That would enable an HBase-savy user to provide Hive with hints regarding how it should interact with HBase. Support for simple split sampling of HBase tables (HIVE-3399) could also be easily done because HBase manages table partitions already.

Other access channels

Everything discussed thus far has required Hive to interact with online HBase RegionServers. Applications may stand to gain significant throughput and enjoy greater flexibility by interacting directly with HBase data persisted to HDFS. This also has the benefit of preventing Hive workloads from interfering with online SLA-bound HBase applications (at least, until we see HBase improvements in QOS isolation between tasks, HBASE-4441).

As mentioned earlier, there is the HiveHFileOutputFormat. Resolving HIVE-4627 should make Hive a straight-forward way to generate HFiles for bulk loading. Once you’ve created the HFiles using Hive, there’s still the last step of running the
LoadIncrementalHFiles utility to copy and register them in the regions. For this, the HiveStorageHandler interface will need some kind of hook to influence the query plan as it’s created, allowing it to append steps. Once in place, it should be possible to SET a runtime flag, switching an INSERT operation to use bulkload.

HBase recently introduced the table snapshot feature. This allows a user to create a persisted point-in-time view of a table, persisted to HDFS. HBase is able to restore a table from a snapshot to a previous state, and to create an entirely new table from an existing snapshot. Hive does not currently support reading from an HBase snapshot. For that matter, HBase doesn’t yet support MapReduce jobs over snapshots, though the feature is a work in progress (HBASE-8369).

Conclusions

The interface between HBase and Hive is young, but has nice potential. There’s a lot of low-hanging fruit that can be picked up to make things easier and faster. The most glaring issue barring real application development is the impedance mismatch between Hive’s typed, dense schema and HBase’s untyped, sparse schema. This is as much a cognitive problem as technical issue. Solutions here would allow a number of improvements to fall out, including much in the way of performance improvements. I’m hopeful that continuing work to add data types to HBase (HBASE-8089) can help bridge this gap.

Basic operations mostly work, at least in a rudimentary way. You can read data out of and write data back into HBase using Hive. Configuring the environment is an opaque and manual process, one which likely stymies novices from adopting the tools. There’s also the question of bulk operations – support for writing HFiles and reading HBase snapshots using Hive is entirely lacking at this point. And of course, there are bugs sprinkled throughout. The biggest recent improvement is the deprecation of HCatalog’s interface, removing the necessary upfront decision regarding which interface to use.

Hive provides a very usable SQL interface on top of HBase, one which integrates easily into many existing ETL workflows. That interface requires simplifying some of the BigTable semantics HBase provides, but the result will be to open up HBase to a much broader audience of users. The Hive interop compliments extremely well the experience provided by Phoenix. Hive has the benefit of not requiring the deployment complexities currently required by that system. Hopefully the common definition of types will allow a complimentary future.

Categorized by :
Developer HBase HDP Hive Other

Comments

lee
|
March 29, 2014 at 3:46 am
|

help me~~~~~~~~~~~~~~~`

hive> select url from short_urls;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_1396086453028_0003, Tracking URL = http://master:8088/proxy/application_1396086453028_0003/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1396086453028_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-29 19:40:29,702 Stage-1 map = 0%, reduce = 0%
2014-03-29 19:41:06,617 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1396086453028_0003 with errors
Error during job, obtaining debugging information…
Job Tracking URL: http://master:8088/proxy/application_1396086453028_0003/
Examining task ID: task_1396086453028_0003_m_000000 (and more) from job job_1396086453028_0003

Task with the most failures(4):
—–
Task ID:
task_1396086453028_0003_m_000000

URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1396086453028_0003&tipid=task_1396086453028_0003_m_000000
—–
Diagnostic Messages for this Task:
Error: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:157)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:80)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.close(HiveHBaseTableInputFormat.java:198)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:208)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1957)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

    |
    April 8, 2014 at 10:31 am
    |

    Hi Lee,

    Your stacktrace looks a lot like HIVE-6117. The fix will be included in Apache Hive 0.13 and as a part of HDP-2.1.

    Thanks,
    Nick

Pankaj Kumar
|
February 6, 2014 at 6:07 am
|

Hi,

I have been trying to setup hive to query over HBase tables. I am using hadoop-2.2.0,hbase-0.96.0-hadoop2,hive-0.12.0-bin. And am able to create table using the following query:
CREATE TABLE User1( rowkey String,UserID BIGINT) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,D:UserID’) TBLPROPERTIES (‘hbase.table.name’ = ‘User’)
But when I try to query it using:
Select count(*) from User1;
I get the following exception:
Error: java.lang.ClassCastException: org.apache.hadoop.hbase.client.Result cannot be cast to org.apache.hadoop.io.Writable
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:178)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Here is what my hive-site.xml looks like:

hive.aux.jars.path
file://antlr…[all hive/lib jars],[all hbase/lib jars]

hbase.zookeeper.quorum
localhost

and I have added hbase/lib/hbase*.jar,hbase/lib/zookeeper*.jar into hive/lib.
I have also tried the same with different versions of hive/hadoop/hbase with same exception coming (hive 11, hbase 0.96.1, hadoop 1.2.1).
Any help in this regard will be much appreciated.

Regards
Pankaj

    |
    February 11, 2014 at 4:49 pm
    |

    It sounds like your Hive build hasn’t been updated to use HBase 0.96. Can you try hive trunk, or a distribution that packages both Hive and HBase (such as HDP ;) )?

Vikas Parashar
|
December 27, 2013 at 2:04 am
|

hi,
I am using hbase 0.96 and hive 0.12. But at the time of map reduce job; i am getting below error.

Caused by: java.lang.NoSuchMethodError: org.cloudera.htrace.Trace.startSpan(Ljava/lang/String;)Lorg/cloudera/htrace/TraceScope;
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:196)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:794)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:627)
… 20 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143

    dimmacro
    |
    December 30, 2013 at 8:08 pm
    |

    Hi Vikas,
    I hope your problem has been figured out.If not ,may be I can share you some experience cause I also have faced you problem when interacted hive0.12 with HBase0.96,hadoop 2.2.0,and also have missing TraceScope error.
    first ,you should comfirm which hadoop version your hive0.12 and HBase0.96 base on ,are the hadoop version exactly the same? you should also make sure you have configure the HADOOP_HOME environment。ps: My hive0.12 and HBase0.96 are all based on hadoop2.2.
    secode, you should replace all the hbase*.jar under hive0.12/lib with the hbase jars which under hbase0.96/lib,especially pay attention on protobuf*.jar and zookeeper*.jar,make sure the jars version are exactly the same.
    third,if your hadoop version is heighter then 2.0,you should download hive source and rebuild it ,use command : mvn clean package -DskipTests -Phadoop-2 ,then copy all the newest *SNAPSHOT.jar to the hive0.12/lib
    if you have any problem,you can leave message to me.

      Vikas Parashar
      |
      December 30, 2013 at 9:59 pm
      |

      Thanks everyone!

      Yup!! i got success after 2 nights. As i mentioned that problem is related with map-reduce. So, after long debugging; i found there was some thing missing on map-reduce library. i have copied below jar in the map-reduce.

      guava-12.0.1.jar
      hbase-protocol-0.96.0.2.0.6.0-76-hadoop2.jar
      htrace-core-2.01.jar
      hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar
      hbase-server-0.96.0.2.0.6.0-76-hadoop2.jar zookeeper-3.4.5.2.0.6.0-76.jar
      hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar hive-hbase-handler-0.12.0.2.0.6.0-76.jar

      It’s a patch like thing but work for me.

        |
        January 2, 2014 at 10:31 am
        |

        You may also find it useful to know that as of 0.96.1 (HBASE-8438), you can use the command `hbase mapredcp` to get this list of jars automatically. I’ve been using it with Hive as follows:


        $ HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) hive ...

    |
    December 30, 2013 at 11:46 am
    |

    Hi Vikas,

    It looks like you’re having some version miss-match issues. Please bear in mind that these instructions are for using HDP-1.3, not HDP-2.0. Would you mind taking your question over to the HBase section of our Community Forums?

    Thanks,
    Nick

Vikas Parashar
|
December 27, 2013 at 1:52 am
|

Hey, I am using hbase 0.96 and hive 0.12. But when i run map reduce job. I got below error

rror: java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:383)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:360)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:244)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:187)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:164)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:91)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
… 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:381)
… 15 more
Caused by: java.lang.NoSuchMethodError: org.cloudera.htrace.Trace.startSpan(Ljava/lang/String;)Lorg/cloudera/htrace/TraceScope;
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:196)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:794)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:627)
… 20 more

dimmacro
|
December 26, 2013 at 3:43 am
|

Hi Nick,I have used Hive0.12 to interact with HBase0.96,but when I create table in hive command using like “create table test3(time string,amount string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile; ” It cause errors,which details is:
create table test3(time string,amount string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.TApplicationException: Internal error processing create_table_with_environment_context.

And I have also read the issue you reported on ,so I just want to know which exactly hive version and hbase verson you use to combine them and how you config them,thanks.

    |
    December 26, 2013 at 11:10 am
    |

    The above examples were run on an HDP-1.3 deployment, so Hive 0.11.0 and HBase 0.94.6.1. The only specific configurations necessary are outlined in the above post — I point out where you need to set environment variables and so on before running a command. I have not returned to examine an updated post for Hive 0.12.x + HBase 0.96.x.

Thomas
|
November 13, 2013 at 2:50 am
|

Hi,

Thanks for this post, it’s a nice coincidence because we’re just struggling to integrate Hive and HBase.

First, it took us some time to figure out how to pass the HBase JARs to the Hive client, through HADOOP_CLASSPATH. Basically, append /usr/lib/hbase/lib/hbase-*.jar there (we’re on HDP 2 and HBase 0.95.2.2.0.5.0-67 and Hive 0.11.0.2.0.5.0-67).

But table creation is still failing.
Hive connects to ZK ok, freezes for a minute or so then fails with “java.io.IOException: Failed to find location, tableName=hbase:meta, …” on the client. Apparently, it’s happening when Hive is reading table metadata from HBase. We have set hbase.zookeeper.quorum and hbase.master on the command line through –hiveconf, to no avail…

Any idea what might be causing that?

    Nick Dimiduk
    |
    November 13, 2013 at 4:53 pm
    |

    Hi Thomas,

    I’m glad you find the post useful. Would you mind bringing your question over to our community forums? I can ask you a couple more questions and try to get things worked out.

    You may also find my next post to be of use. It’ll include an end-to-end example.

    Thanks,
    Nick

      Thomas Vial
      |
      November 14, 2013 at 8:39 am
      |

      Thanks Nick! You will find the question in the “Hive+HBase table creation freezing and failing” topic under the Hive / HCatalog forum.
      FYI, I’ve just tried with Hive version 12 (from HW), without success.

      Regards,
      Thomas

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Recently in the Blog