Category Archives: cluster

how to build hadoop 2.3.0 for cubietruck with lubuntu server ?

Hi,

After so many times to build hadoop on my cubietruck, I made a simple tutorial for helping you building clusters :)

 

So, for begining, you need tools for building

$ sudo apt-get install build-essential

$ sudo apt-get install g++ autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev

$ sudo apt-get install maven

(maybe you have to libwagon-java pb do : sudo dpkg -i –force-all /var/cache/apt/archives/libwagon2-java_2.2-3+nmu1_all.deb)

Another prerequisite, protoco buffer: protobuf version 2.5, which can be downloaded from https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz

$ tar xzvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./configure –prefix=/usr
$ make
$ make check
$ sudo make install

 

Having all the tools, we can now build Hadoop native libraries.

$ wget http://mirror.catn.com/pub/apache/hadoop/core/hadoop-2.3.0/hadoop-2.3.0-src.tar.gz

$ tar xzvf hadoop-2.3.0-src.tar.gz

$ cd hadoop-2.3.0-src

for cubietruck you have to apply a patch https://issues.apache.org/jira/browse/HADOOP-9320
and a second patch

Index: hadoop-common-project/hadoop-auth/pom.xml
===================================================================
— hadoop-common-project/hadoop-auth/pom.xml (revision 1543124)
+++ hadoop-common-project/hadoop-auth/pom.xml (working copy)
@@ -54,6 +54,11 @@
</dependency>
<dependency>
<groupId>org.mortbay.jetty</groupId>
+ <artifactId>jetty-util</artifactId>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<scope>test</scope>
</dependency>

Then you can build with openjdk1.8.0 hard float (from oracle)
if you need to know what kind of arm you have  dpkg –print-architecture.

So now let’s beggin for one night building

$ mvn package -Pdist,native -DskipTests -Dtar

 

#INSTALLATION

In you target directory of your build

$ mv hadoop-2.3.0 hadoop
$ sudo mv hadoop /usr/local

 

After you can generate ssh key

$ ssh-keygen -t rsa -P  » -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
$ exit

Now we have to modify some sh and the .bashrc

$ cd ~$ nano .bashrc

# Add those exports for Hadoop
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

 

$ nano $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh

#add this
export HADOOP_HEAPSIZE=1024
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS= »-Djava.library.path=$HADOOP_PREFIX/lib »

 

$ nano $HADOOP_PREFIX/etc/hadoop/yarn-env.sh

#add this
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS= »-Djava.library.path=$HADOOP_PREFIX/lib »

 

$ nano $HADOOP_PREFIX/etc/hadoop/core-site.xml

#add this property in configuration
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
The URI’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The URI’s authority is used to
determine the host, port, etc. for a filesystem.
</description>
</property>

 

$ nano $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml

#add this property in configuration
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/workspace/hadoop/dfs/name</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/workspace/hadoop/dfs/data</value>
<final>true</final>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

 

$ nano $HADOOP_PREFIX/etc/hadoop/mapred-site.xml

#add this property in configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapred.system.dir</name>
<value>file:/root/workspace/hadoop/mapred/system</value>
<final>true</final>
</property>

<property>
<name>mapred.local.dir</name>
<value>file:/root/workspace/hadoop/mapred/local</value>
<final>true</final>
</property>
</configuration>

 

$ nano $HADOOP_PREFIX/etc/hadoop/yarn-site.xml

#add this property in configuration
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

then do a name node format

$ hdfs namenode -format

Now your hadoop server is ready to start him

$ start-all.sh // will start all hadoop processes

$ jps // will report the local VM identifier

$ stop-all.sh // will stop all hadoop processes

To check if everything is running and to conclude this tutorial use your browser

DFSHealth

http://xxx.xxx.xxx.xxx:50070/dfshealth.jsp

http://xxx.xxx.xxx.xxx:50070/dfshealth.html

Status

http://xxx.xxx.xxx.xxx:50090/status.jsp

 

$ netstat -netpl | grep java

 

That’s all for today

So enjoy hadoop!