Configure Java Environment
With the image Raspbian Jessie image, Java comes pre-installed. Verify by typing:
java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)
Prepare Hadoop User Account and Group
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
Configure SSH
Create SSH RSA pair keys with blank password in order for hadoop nodes to be able to talk with each other without prompting for password.
su hduser
mkdir ~/.ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
Verify that hduser can login to SSH
su hduser
ssh localhost
Go back to previous shell (pi/root).
Install Hadoop
Download and install
cd ~/
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
sudo mkdir /opt
sudo tar -xvzf hadoop-2.6.4.tar.gz -C /opt/
cd /opt
sudo mv hadoop-2.6.4 hadoop
sudo chown -R hduser:hadoop hadoop
Configure Environment Variables
This configuration assumes that you are using the pre-installed version of Java in Raspbian Jessie.
Add hadoop to environment variables by adding the following lines to the end of /etc/bash.bashrc:
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
Alternative you can add the configuration above to ~/.bashrc in the home directory of hduser.
Exit and reopen hduser shell to verify hadoop executable is accessible outside /opt/hadoop/bin folder:
exit
su hduser
hadoop version
hduser@node1 /home/hduser $ hadoop version
Hadoop 2.6.4
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /opt/hadoop/hadoop-core-2.6.4.jar
Configure Hadoop environment variables
As root/sudo edit /opt/hadoop/conf/hadoop-env.sh, uncomment and change the following lines:
# The java implementation to use. Required.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=250
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTSi -client"
Also, we need to edit “yarn-env.sh”. Uncomment
#export YARN_NODEMANAGER_OPTS
and write:
export YARN_NODEMANAGER_OPTS=”-client”
Note 1: If you forget to add the -client option to HADOOP_DATANODE_OPTS and/or YARN_NODEMANAGER_OPTS you will get the following error messge in hadoop-hduser-datanode-node1.out:
Error occurred during initialization of VM
Server VM is only supported on ARMv7+ VFP
Note 2: If you run SSH on a different port than 22 then you need to change the following parameter:
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
export HADOOP_SSH_OPTS="-p <YOUR_PORT>"
Or you will get the error:
connect to host localhost port 22: Address family not supported by protocol
Configure Hadoop
In /opt/hadoop/conf edit the following configuration files:
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hdfs/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Create HDFS file system
sudo mkdir -p /hdfs/tmp
sudo chown hduser:hadoop /hdfs/tmp
sudo chmod 750 /hdfs/tmp
hadoop namenode -format
Start services
Login as hduser. Run:
/opt/hadoop/sbin/start-dfs.sh
/opt/hadoop/sbin/start-yarn.sh
Run the jps command to checkl that all services started as supposed to:
jps
16640 JobTracker
16832 Jps
16307 NameNode
16550 SecondaryNameNode
16761 TaskTracker
16426 DataNode
If you cannot see all of the processes above review the log files in /opt/hadoop/logs to find the source of the problem.