Configure Java Environment
With the image Raspbian Jessie image, Java comes pre-installed. Verify by typing:
java -version java version "1.8.0" Java(TM) SE Runtime Environment (build 1.8.0-b132) Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)
Prepare Hadoop User Account and Group
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo
Configure SSH
Create SSH RSA pair keys with blank password in order for hadoop nodes to be able to talk with each other without prompting for password.
su hduser mkdir ~/.ssh ssh-keygen -t rsa -P "" cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
Verify that hduser can login to SSH
su hduser ssh localhost
Go back to previous shell (pi/root).
Install Hadoop
Download and install
cd ~/ wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz sudo mkdir /opt sudo tar -xvzf hadoop-2.6.4.tar.gz -C /opt/ cd /opt sudo mv hadoop-2.6.4 hadoop sudo chown -R hduser:hadoop hadoop
Configure Environment Variables
This configuration assumes that you are using the pre-installed version of Java in Raspbian Jessie.
Add hadoop to environment variables by adding the following lines to the end of /etc/bash.bashrc:
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") export HADOOP_INSTALL=/opt/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin
Alternative you can add the configuration above to ~/.bashrc in the home directory of hduser.
Exit and reopen hduser shell to verify hadoop executable is accessible outside /opt/hadoop/bin folder:
exit su hduser hadoop version hduser@node1 /home/hduser $ hadoop version Hadoop 2.6.4 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152 Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013 From source with checksum 6923c86528809c4e7e6f493b6b413a9a This command was run using /opt/hadoop/hadoop-core-2.6.4.jar
Configure Hadoop environment variables
As root/sudo edit /opt/hadoop/conf/hadoop-env.sh, uncomment and change the following lines:
# The java implementation to use. Required. export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=250 # Command specific options appended to HADOOP_OPTS when specified export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTSi -client"
Also, we need to edit “yarn-env.sh”. Uncomment
#export YARN_NODEMANAGER_OPTS
and write:
export YARN_NODEMANAGER_OPTS=”-client”
Note 1: If you forget to add the -client option to HADOOP_DATANODE_OPTS and/or YARN_NODEMANAGER_OPTS you will get the following error messge in hadoop-hduser-datanode-node1.out:
Error occurred during initialization of VM Server VM is only supported on ARMv7+ VFP
Note 2: If you run SSH on a different port than 22 then you need to change the following parameter:
# Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" export HADOOP_SSH_OPTS="-p <YOUR_PORT>"
Or you will get the error:
connect to host localhost port 22: Address family not supported by protocol
Configure Hadoop
In /opt/hadoop/conf edit the following configuration files:
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/hdfs/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Create HDFS file system
sudo mkdir -p /hdfs/tmp sudo chown hduser:hadoop /hdfs/tmp sudo chmod 750 /hdfs/tmp hadoop namenode -format
Start services
Login as hduser. Run:
/opt/hadoop/sbin/start-dfs.sh /opt/hadoop/sbin/start-yarn.sh
Run the jps command to checkl that all services started as supposed to:
jps 16640 JobTracker 16832 Jps 16307 NameNode 16550 SecondaryNameNode 16761 TaskTracker 16426 DataNode
If you cannot see all of the processes above review the log files in /opt/hadoop/logs to find the source of the problem.