Installing Hadoop 2.6.X on Raspberry Pi B Raspbian Jessie

Configure Java Environment

With the image Raspbian Jessie image, Java comes pre-installed. Verify by typing:

java -version

java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)

Prepare Hadoop User Account and Group

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo

Configure SSH

Create SSH RSA pair keys with blank password in order for hadoop nodes to be able to talk with each other without prompting for password.

su hduser
mkdir ~/.ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Verify that hduser can login to SSH

su hduser
ssh localhost

Go back to previous shell (pi/root).

Install Hadoop

Download and install

cd ~/
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
sudo mkdir /opt
sudo tar -xvzf hadoop-2.6.4.tar.gz -C /opt/
cd /opt
sudo mv hadoop-2.6.4 hadoop
sudo chown -R hduser:hadoop hadoop

Configure Environment Variables

This configuration assumes that you are using the pre-installed version of Java in Raspbian Jessie.

Add hadoop to environment variables by adding the following lines to the end of /etc/bash.bashrc:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

Alternative you can add the configuration above to ~/.bashrc in the home directory of hduser.

Exit and reopen hduser shell to verify hadoop executable is accessible outside /opt/hadoop/bin folder:

exit
su hduser
hadoop version

[email protected] /home/hduser $ hadoop version
Hadoop 2.6.4
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /opt/hadoop/hadoop-core-2.6.4.jar

Configure Hadoop environment variables

As root/sudo edit /opt/hadoop/conf/hadoop-env.sh, uncomment and change the following lines:

# The java implementation to use. Required.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=250

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTSi -client"

Also, we need to edit “yarn-env.sh”. Uncomment

#export YARN_NODEMANAGER_OPTS

and write:
export YARN_NODEMANAGER_OPTS=”-client”

Note 1: If you forget to add the -client option to HADOOP_DATANODE_OPTS and/or YARN_NODEMANAGER_OPTS you will get the following error messge in hadoop-hduser-datanode-node1.out:

Error occurred during initialization of VM
Server VM is only supported on ARMv7+ VFP

Note 2: If you run SSH on a different port than 22 then you need to change the following parameter:

# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
export HADOOP_SSH_OPTS="-p <YOUR_PORT>"

Or you will get the error:

connect to host localhost port 22: Address family not supported by protocol

Configure Hadoop

In /opt/hadoop/conf edit the following configuration files:

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Create HDFS file system

sudo mkdir -p /hdfs/tmp
sudo chown hduser:hadoop /hdfs/tmp
sudo chmod 750 /hdfs/tmp
hadoop namenode -format

Start services

Login as hduser. Run:

/opt/hadoop/sbin/start-dfs.sh
/opt/hadoop/sbin/start-yarn.sh

Run the jps command to checkl that all services started as supposed to:

jps

16640 JobTracker
16832 Jps
16307 NameNode
16550 SecondaryNameNode
16761 TaskTracker
16426 DataNode

If you cannot see all of the processes above review the log files in /opt/hadoop/logs to find the source of the problem.