I installed Hive on HDP 1.3 on one node. I used the default Derby database for metastore. The Hadoop installation upon which it is based has 8 data nodes. Currently the hive client is installed on only one machine. So I have few questions
1. When I run a command on hive, how do I know it is being executed on all the 8 nodes?
2. Do I have to install hive client on all the nodes to be able to distribute the data to all the nodes? Reason I am asking is that the node where Hive is installed has about 600 Gb of space. If the data is distributed to all the 8 nodes I can load more directories.
3. Sometimes while loading a hive table I am getting errors such as “could only be replicated to 0 nodes, instead of 1″ followed by a java stack trace. This happens at random files. This tells me that hive data is being replicated on only 1 node. Perhaps that node is running out of disk space?