-
Notifications
You must be signed in to change notification settings - Fork 1
5. Apache Hadoop
Note: This command runs on /home/{YOURUSERNAME}
(current active directory, how to check? run pwd
command on the terminal) by default, if you want to change the installation path of Hadoop, simply change the current active dir /home/{YOURUSERNAME}
with you want using the cd command.
-
Required additional step before the Hadoop installation :
- Java installation (minimum version 8) OpenJDK/Oracle
- Set the JAVA_HOME environment variable
-
Download Hadoop
$ wget https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
-
Extract the Hadoop archive
$ tar -xzf hadoop-3.2.1.tar.gz
-
Rename the Hadoop directory name
$ mv hadoop-3.2.1 hadoop
-
(optional) Remove the downloaded archive to save space
$ rm -f hadoop-3.2.1.tar.gz
-
Add Hadoop to user PATH, edit /home/{YOURUSERNAME}/.bashrc with any text editor (ex:
nano .bashrc
), add this line at the bottom:PATH=$PATH:/home/{YOURUSERNAME}/hadoop/bin:/home/{YOURUSERNAME}/hadoop/sbin
-
Load the new environment:
$ source ~/.bashrc
-
Check the Hadoop version using this command:
$ hadoop version
Note: Run this installation if you want only install the Hadoop on one pc
-
Make sure you can connect using ssh to localhost without a password if ssh still asks the password run these commands and then check again:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
-
Configure the Hadoop, edit the
/home/{YOURUSERNAME}/hadoop/etc/hadoop/core-site.xml
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Note: if you have portainer installed on port 9000 change the Hadoop port to other than 9000 or change the portainer port. Don't forget to change the KaspaCoreSystem application.conf too
-
Edit the
/home/{YOURUSERNAME}/hadoop/etc/hadoop/hdfs-site.xml
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/{YOURUSERNAME}/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/{YOURUSERNAME}/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.rpc-bind-host</name> <value>0.0.0.0</value> </property> </configuration>
-
Edit the
/home/{YOURUSERNAME}/hadoop/etc/hadoop/mapred-site.xml
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
-
Edit the
/home/{YOURUSERNAME}/hadoop/etc/hadoop/yarn-site.xml
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
-
Format Hadoop DFS
$ hdfs namenode -format
-
Start the Hadoop Namenode and Yarn
# To start: $ start-dfs.sh $ start-yarn.sh # To stop it, run : $ stop-dfs.sh $ stop-yarn.sh
-
Create the required directory
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/job $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kafka-checkpoint $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa-checkpoint $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/schema/raw_kaspa $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/file/maxmind
-
Download Maxmind Database (GeoLite2-City) from here: https://www.maxmind.com/en/accounts/current/geoip/downloads
-
Put the file
GeoLite2-City.mmdb
into HadoopFS$ hdfs dfs -put /path/to/GeoLite2-City.mmdb /user/{YOURUSERNAME}/file/maxmind/