I’ll answer you questions the best that I can,
1) Though slave nodes can be configured to use Network Storage, it is better to use local storage. If they are configured to use netwrok storage there will be a performance hit during processing as the data would need to be transferred to the node on which it was being processed. When configured for local storage ( actually a piece of HDFS ) the amount of transfer of data is minimized as Hadoop tried to have the data processed on the node where it is stored.
2) The Secondary Name node daemon should be run on a separate machine from either of the HA NameNode boxes.
3) Though it is possible to install Ambari to an HA cluster it is not currently supported.
4) If you need real time updates to the data or have need of a columnar database then you should install HBase.
5) The server daemons for each of the services you list should be installed to one of the slave nodes, with the exception of Zookeeper which should be installed to several (an odd number) of the slave nodes. The clients for these service should be installed to one of the slave nodes. The basic picture here is that the Master nodes should only be running the NameNode and JobTracker daemons, all else should ideally be on the slaves, taking care that you don’t run too many of the services on a single node and overload it.
6) The metastore for hive is ideally installed locally to the hive server. It depends on which database you decide to use to figure out the future capacity.