I am planning to create Hadoop pre-production cluster of 12 nodes (10 slaves and 2 masters) and I have a number of open questions:
1) Does slave node should use local storage or can use network storage too? What performance impact expected?
2) On which server (from hardware point of view) secondary NameNode should be deployed? I am going to configure NameNodes cluster for HA.
3) Is it possible to install NameNodes HA cluster using Ambari?
4) How can I decide about necessarily of HBase installation? I need to decide about HBase installation for building hardware requirements.
5) On which node (master or slave) should be installed : Hive, ZooKeeper, Nagios, Gangila, Pig, Oozie, Sqoop? Maybe it should be additional, out of Hadoop cluster machine? If yes, what hardware requirements?
6) Does meta-storage of Hive should be local or remote? How can I calculate future capacity of Hive meta-storage?
Thank you in advance.