5. Apache Hadoop

Back

Note: This command runs on /home/{YOURUSERNAME} (current active directory, how to check? run pwd command on the terminal) by default, if you want to change the installation path of Hadoop, simply change the current active dir /home/{YOURUSERNAME} with you want using the cd command.

Required additional step before the Hadoop installation :
- Java installation (minimum version 8) OpenJDK/Oracle
- Set the JAVA_HOME environment variable

Download Hadoop

$ wget https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Extract the Hadoop archive
```
$ tar -xzf hadoop-3.2.1.tar.gz
```
Rename the Hadoop directory name
```
$ mv hadoop-3.2.1 hadoop
```
(optional) Remove the downloaded archive to save space
```
$ rm -f hadoop-3.2.1.tar.gz
```
Add Hadoop to user PATH, edit /home/{YOURUSERNAME}/.bashrc with any text editor (ex: nano .bashrc), add this line at the bottom:
```
PATH=$PATH:/home/{YOURUSERNAME}/hadoop/bin:/home/{YOURUSERNAME}/hadoop/sbin
```
Load the new environment:
```
$ source ~/.bashrc
```
Check the Hadoop version using this command:
```
$ hadoop version
```

Single Node Cluster Installation

Note: Run this installation if you want only install the Hadoop on one pc

Make sure you can connect using ssh to localhost without a password if ssh still asks the password run these commands and then check again:
```
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
```
Configure the Hadoop, edit the /home/{YOURUSERNAME}/hadoop/etc/hadoop/core-site.xml file, change the file content with this:
```
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
```
Note: if you have portainer installed on port 9000 change the Hadoop port to other than 9000 or change the portainer port. Don't forget to change the KaspaCoreSystem application.conf too

Edit the /home/{YOURUSERNAME}/hadoop/etc/hadoop/hdfs-site.xml file, change the file content with this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/{YOURUSERNAME}/hadoop/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/{YOURUSERNAME}/hadoop/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-bind-host</name>
        <value>0.0.0.0</value>
    </property>
</configuration>

Edit the /home/{YOURUSERNAME}/hadoop/etc/hadoop/mapred-site.xml file, change the file content with this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

Edit the /home/{YOURUSERNAME}/hadoop/etc/hadoop/yarn-site.xml file, change the file content with this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

Format Hadoop DFS
```
$ hdfs namenode -format
```

Start the Hadoop Namenode and Yarn

# To start:
$ start-dfs.sh
$ start-yarn.sh

# To stop it, run : 
$ stop-dfs.sh
$ stop-yarn.sh

Create the required directory

$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/job
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kafka-checkpoint
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa-checkpoint
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/schema/raw_kaspa
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/file/maxmind

Download Maxmind Database (GeoLite2-City) from here: https://www.maxmind.com/en/accounts/current/geoip/downloads

Put the file GeoLite2-City.mmdb into HadoopFS

$ hdfs dfs -put /path/to/GeoLite2-City.mmdb /user/{YOURUSERNAME}/file/maxmind/

Back

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5. Apache Hadoop

Single Node Cluster Installation

Clone this wiki locally