Introduction

The following article (is accompanied with a very ’techy’-heavy warning!) will provide a step-by-step guide to installing Hadoop version 2.6. on a CentOS 7 using an rpm built in a 64 version of the OS.

Prerequisites
  • Time (server hours)
  • Internet connection (in some capacity)
  • VMBox (or a cloud machine access)
  • Root access (you can install Hadoop without root access to the system, however it is a bit more complicated. Remember, root access is required only during the installation phase, not for application/service execution)!
How to:

1. Download VMWare player* or Oracle Virtual Box. 2. Download CentOS 7 ISO image** or any other distro based on RHEL. 3. Install VM software. 4. Install the ISO image. 5. Launch Installed VM. 6. Open Terminal. 7. Switch to root user. 8. Execute the following:           # sudo su -          # sudo yum update 9. Install all updates and remove existing JAVA:           # sudo yum remove java 10. Download Oracle JAVA*** a. Download the 64bit .rpm package b. Execute # yum localinstall <java_pachakge_name>.rpm 11. Set JAVA_HOME****           # vi /etc/profile.d/java.sh 12. Add the following lines:           #!/bin/bash          JAVA_HOME=/usr/java/default          PATH=$JAVA_HOME/bin:$PATH          export PATH JAVA_HOME          # chmod +x /etc/profile.d/java.sh          # source /etc/profile.d/java.sh13. Check java:          # java -version Which should return the java version:          # echo $JAVA_HOMEWhich in turn should return the java home dir path. 14. Download Maven           # tar -zxvf <maven_pachage_name>.tar.gz -C /opt/ 15. Set M3_HOME           # vi /etc/profile.d/maven.sh 16. Add the following lines:           #!/bin/bash          M3_HOME=/opt/<maven_dir_name>          PATH=$M3_HOME/bin:$PATH          export PATH M3_HOME          # chmod +x /etc/profile.d/maven.sh          # source /etc/profile.d/maven.sh17. Check Maven          # mvn -versionWhich should return the Maven version:          # echo $M3_HOMEWhich in turn should return the Maven home dir path. 18. Download the following tools for Hadoop native code compilation.           # yum group install "Development           #yum install openssl-devel zlib-devel 19. Download # wget: http://cbs.centos.org/kojifiles/packages/protobuf/2.5.0/10.el7.centos/x86_64/protobuf-2.5.0-10.el7.centos.x86_64.rpm# wget: http://cbs.centos.org/kojifiles/packages/protobuf/2.5.0/10.el7.centos/x86_64/protobuf-devel-2.5.0-10.el7.centos.x86_64.rpm# wget: http://cbs.centos.org/kojifiles/packages/protobuf/2.5.0/10.el7.centos/x86_64/protobuf-compiler-2.5.0-10.el7.centos.x86_64.          # yum -y install protobuf-***** 20. Prep for Hadoop: execute the following commands           # groupadd hadoop          # useradd -g hadoop yarn (Note: yarn user - is going to be used for node manager)           # useradd -g hadoop hdfs (Note: hdfs user - is for things related to the hdfs file system)           # useradd -g hadoop mapred (Note: mapred user - related to map reduce jobs) (Note: You can add passwd to users if you like.) 21. Login to hdfs (Note: This step is required as Hadoop needs a ssh connection without a passphrase.)          # su - hdfs          # ssh-keygen -t rsa -P ""          # cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys          # chmod 0600 ~/.ssh/authorized_keys 22. Test ssh           # ssh localhost date          # yes 23. Exit hdfs user           # exit 24. Download Apache Hadoop (source) 25. Extract tar file into /opt Dir           # tar -zxvf <hadoop_pachage_name>.tar.gz -C /opt/ 26. Navigate to the new Hadoop dir           # cd /opt/<hadoop_dir_name>/ 27. Edit the pom.xml file and add <additionalparam>-Xdoclint:none</additionalparam> to the properties section. For           …          <!-- platform encoding override -->          <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>          <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>          <additionalparam>-Xdoclint:none</additionalparam>          </properties>          … (Note: This step in only required if you decided to use Java 8.) 28. Execute the following commands:           # cd ..          # chown hdfs:hadoop <hadoop_dir_name> -R(Note: Make sure that no permission blocks exist). 29. Build the native Hadoop library           # su - hdfs          # cd /opt/<hadoop_dir_name>          # mvn package -Pdist,native -DskipTests -Dtar Go grab some coffee/tea... This step is not mandatory, but recommended! Here’s what you should see by the end of the process.   [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ................................. SUCCESS [ 16.389 s] [INFO] Apache Hadoop Project POM .......................... SUCCESS [  6.905 s] [INFO] Apache Hadoop Annotations .......................... SUCCESS [  8.923 s] [INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.340 s] [INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  5.277 s] [INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  8.378 s] [INFO] Apache Hadoop MiniKDC .............................. SUCCESS [02:25 min] [INFO] Apache Hadoop Auth ................................. SUCCESS [01:47 min] [INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  4.060 s] [INFO] Apache Hadoop Common ............................... SUCCESS [03:10 min] [INFO] Apache Hadoop NFS .................................. SUCCESS [  7.413 s] [INFO] Apache Hadoop KMS .................................. SUCCESS [ 45.635 s] [INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.046 s] [INFO] Apache Hadoop HDFS ................................. SUCCESS [02:32 min] [INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 21.490 s] [INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 17.206 s] [INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  4.122 s] [INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.044 s] [INFO] hadoop-yarn ........................................ SUCCESS [  0.054 s] [INFO] hadoop-yarn-api .................................... SUCCESS [ 37.593 s] [INFO] hadoop-yarn-common ................................. SUCCESS [01:36 min] [INFO] hadoop-yarn-server ................................. SUCCESS [  0.036 s] [INFO] hadoop-yarn-server-common .......................... SUCCESS [ 15.557 s] [INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 42.800 s] [INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  2.961 s] [INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [  6.280 s] [INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 20.282 s] [INFO] hadoop-yarn-server-tests ........................... SUCCESS [  5.231 s] [INFO] hadoop-yarn-client ................................. SUCCESS [  7.769 s] [INFO] hadoop-yarn-applications ........................... SUCCESS [  0.031 s] [INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  3.625 s] [INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  2.082 s] [INFO] hadoop-yarn-site ................................... SUCCESS [  0.038 s] [INFO] hadoop-yarn-registry ............................... SUCCESS [  5.406 s] [INFO] hadoop-yarn-project ................................ SUCCESS [  6.252 s] [INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.080 s] [INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 22.981 s] [INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 17.918 s] [INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  4.349 s] [INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 10.538 s] [INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  8.806 s] [INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  9.771 s] [INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  1.889 s] [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  5.765 s] [INFO] hadoop-mapreduce ................................... SUCCESS [  4.789 s] [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  8.040 s] [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  9.787 s] [INFO] Apache Hadoop Archives ............................. SUCCESS [  2.165 s] [INFO] Apache Hadoop Rumen ................................ SUCCESS [  6.321 s] [INFO] Apache Hadoop Gridmix .............................. SUCCESS [  4.502 s] [INFO] Apache Hadoop Data Join ............................ SUCCESS [  2.613 s] [INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  2.081 s] [INFO] Apache Hadoop Extras ............................... SUCCESS [  3.048 s] [INFO] Apache Hadoop Pipes ................................ SUCCESS [  7.640 s] [INFO] Apache Hadoop OpenStack support .................... SUCCESS [  4.934 s] [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 24.968 s] [INFO] Apache Hadoop Client ............................... SUCCESS [  8.046 s] [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.084 s] [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  5.169 s] [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  9.050 s] [INFO] Apache Hadoop Tools ................................ SUCCESS [  0.025 s] [INFO] Apache Hadoop Distribution ......................... SUCCESS [ 36.246 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 20:28 min [INFO] Finished at: 2015-11-23T07:50:32-08:00 [INFO] Final Memory: 215M/847M [INFO] ------------------------------------------------------------------------  

Configuration

1. Switch back to root           # exit 2. Move the native Hadoop to opt           # mv /opt/<hadoop_dir_name>/hadoop-dist/target/<hadoop_version> /opt/ 3. Create data dir           # mkdir -p /var/data/hadoop/hdfs/nn          # mkdir -p /var/data/hadoop/hdfs/snn          # mkdir -p /var/data/hadoop/hdfs/dn          # chown hdfs:hadoop /var/data/hadoop/hdfs -R 4. Create log dir           # cd /opt/<hadoop_version> (Note: This is the new dir we moved a few steps before)           # mkdir logs          # chmod g+w logs          # chown -R yarn:hadoop . 5. Set HADOOP           # vi /etc/profile.d/hadoop.sh 6. Add the following lines:           #!/bin/bash          HADOOP_HOME=/opt/<hadoop_dir_name>          PATH=$HADOOP_HOME/bin:$PATH          export PATH HADOOP_HOME          # chmod +x /etc/profile.d/hadoop.sh          # source /etc/profile.d/hadoop.s 7. Check Hadoop           # echo $HADOOP_HOME This should return the Hadoop home dir path. 8. Configure Hadoop           # cd /opt/hadoop-2.6.2/etc/hadoop/          # vim core-site.xml 9. Add the following code inside of configuration:           <property>          <name>fs.default.name</name>          <value>hdfs://localhost:9000</value>          </property>          <property>          <name>hadoop.http.staticuser.user</name>          <value>hdfs</value>          </property>          # vim hdfs-site.xml 10. Add the following code inside of configuration:           <property>          <name>dfs.replication</name>          <value>1</value>          </property>          <property>          <name>dfs.namenode.name.dir</name>          <value>file:/var/data/hadoop/hdfs/nn</value>          </property>          <property>          <name>fs.checkpoint.dir</name>          <value>file:/var/data/hadoop/hdfs/snn</value>          </property>          <property>          <name>fs.checkpoint.edits.dir</name>          <value>file:var/data/hadoop/hdfs/snn</value>          </property>          <property>          <name>dfs.datanode.data.dir</name>          <value>file:/var/data/hadoop/hdfs/dn</value>          </property>          # vim mapred-site.xml 11. Add the following code inside of configuration:           <property>          <name>mapreduce.framework.name</name>          <value>yarn</value>          </property>          <property>          <name>mapreduce.jobhistory.intermediate-done-dir</name>          <value>/mr-history/tmp</value>          </property>          <property>          <name>mapreduce.jobhistory.done-dir</name>          <value>/mr-history/done</value>          </property>          # vim yarn-site.xml 12. Add the following code inside of configuration:           <property>          <name>yarn.nodemanager.aux-services</name>          <value>mapreduce_shuffle</value>          </property>          <property>          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>          <value>org.apache.hadoop.mapred.ShuffleHandler</value>          </property> 13. Switch to hdfs user           # su - hdfs          # cd /opt/<hadoop_dir>/bin          # ./hdfs namenode -format          # cd /opt/<hadoop_dir>/sbin          # ./hadoop-daemon.sh start namenode          # ./hadoop-daemon.sh start secondarynamenode          # ./hadoop-daemon.sh start datanode 14. Create /mr-history in hdfs file system for job history           # hdfs dfs -mkdir -p /mr-history/tmp          # hdfs dfs -mkdir -p /mr-history/done          # hdfs dfs -chown -R yarn:hadoop /mr-history 15. Start YARN services           # su – yarn          # cd /opt/<hadoop_dir>/sbin          # ./yarn-daemon.sh start resourcemanager          # ./mr-jobhistory-daemon.sh start historyserver

Check the following:
  • Check that the serves are up and running.
  • Run a sample job to test that Hadoop is working

                   # su - hdfs                    # export YARN_EXAMPLES=/opt/<hadoop_dir>/share/hadoop/mapreduce                    # yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-<hadoop_version>.jar pi 8 100000

  • You should start to see the execution in the terminal.
  • You can also check in the browser in the job tracker how your job is performing.

You are all set. Your test environment is set. Of course you can change the configurations that I have provided here to your liking and needs.   *For the purpose of this demonstration, VMWare player 12.0.1 was used. **CentOS 7 FullDVD ISO image was used. Ubuntu and other Debian based Linux distros will also work, but some installation steps may differ. *** You can install the latest version Java or use the recommended version. A list of recommended versions can be found online. **** There are several ways to set JAVA, I find this the easiest and it guaranties that on reboot JAVA PATH will always stay the same. ***** You can download the latest version of Protocol Buffers from "https://developers.google.com/protocol-buffers/", but you will need to run a couple of extra commands. The above