Stop right here, if you are well-versed in Hadoop development environment, tar balls, maven and all that shenanigans. Otherwise, keep on reading...
I will be describing Hadoop cluster installation using standard Unix packaging like .deb or .rpm, produced by the great stack Hadoop platform called Bigtop. If aren't familiar with Bigtop yet - read about its history and conceptual ideas.
Let's assume you installed Bigtop 0.5.0 release (or a part of it). Or you might go ahead - shameless plug warning - and use a free off-spring of the Bigtop just introduced by WANdisco. Either way you'll end up having the following structure:
/etc/hadoop/conf /etc/init.d/hadoop* /usr/lib/hadoop /usr/lib/hadoop-hdfs /usr/lib/hadoop-yarnyour mileage might vary if you install more components besides Hadoop. Normal bootstrap process will start a Namenode, Datanode, perhaps SecondaryNamenode, and some YARN jazz like resource manager, node manager, etc. My example will cover only HDFS specifics, because YARN's namenode would be a copy-cat and I leave it as exercise to the readers.
Now, the trick is to add more Datanodes. With a dev. setup using tarballs and such you would just clone and change some configuration parameters, and then run a bunch of java processes like:
hadoop-daemon.sh --config <cloned config dir> start datanode
This won't work in the case of packaged installation, because of higher level of complexity involved. This is what needs to be done:
- Clone the config directory
cp -r /etc/hadoop/conf /etc/hadoop/conf.dn2
- In the cloned copy of
hdfs-site.xml, change or add new values for:
- Go to
- In the clone init script add the following
hdfs:hdfsto be the owner of
/etc/init.d/hadoop-hdfs-datanode.dn2 startto fire up the second namenode
- Repeat steps 1 through 6 if you need more nodes running.
- If you need to do this on a regular basis - spare yourself a carpal tunnel and learn Puppet.
dfs.datanode.data.dir dfs.datanode.address dfs.datanode.http.address dfs.datanode.ipc.address(An easy way to mod the port numbers is to add 1000*<node number>)
export HADOOP_PID_DIR="/var/run/hadoop-hdfs.dn2"and modify
CONF_DIR="/etc/hadoop/conf.dn2" PIDFILE="/var/run/hadoop-hdfs.dn2/hadoop-hdfs-datanode.pid" LOCKFILE="$LOCKDIR/hadoop-datanode.dn2"