You must be signed in to change notification settings - Fork 1
5. Apache Hadoop
Note: This command runs on /home/{YOURUSERNAME}
(current active directory, how to check? run pwd
command on the terminal) by default, if you want to change the installation path of Hadoop, simply change the current active dir /home/{YOURUSERNAME}
with you want using the cd command.
Required additional step before the Hadoop installation :
- Java installation (minimum version 8) OpenJDK/Oracle
- Set the JAVA_HOME environment variable
Download Hadoop
$ wget https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
Extract the Hadoop archive
$ tar -xzf hadoop-3.2.1.tar.gz
Rename the Hadoop directory name
$ mv hadoop-3.2.1 hadoop
(optional) Remove the downloaded archive to save space
$ rm -f hadoop-3.2.1.tar.gz
Add Hadoop to user PATH, edit /home/{YOURUSERNAME}/.bashrc with any text editor (ex:
nano .bashrc
), add this line at the bottom:PATH=$PATH:/home/{YOURUSERNAME}/hadoop/bin:/home/{YOURUSERNAME}/hadoop/sbin
Load the new environment:
$ source ~/.bashrc
Check the Hadoop version using this command:
$ hadoop version
Note: Run this installation if you want only install the Hadoop on one pc
Make sure you can connect using ssh to localhost without a password if ssh still asks the password run these commands and then check again:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
Configure the Hadoop, edit the
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Note: if you have portainer installed on port 9000 change the Hadoop port to other than 9000 or change the portainer port. Don't forget to change the KaspaCoreSystem application.conf too
Edit the
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/{YOURUSERNAME}/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/{YOURUSERNAME}/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.rpc-bind-host</name> <value></value> </property> </configuration>
Edit the
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
Edit the
file, change the file content with this:<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
Format Hadoop DFS
$ hdfs namenode -format
Start the Hadoop Namenode and Yarn
# To start: $ start-dfs.sh $ start-yarn.sh # To stop it, run : $ stop-dfs.sh $ stop-yarn.sh
Create the required directory
$ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/job $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kafka-checkpoint $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/kaspa-checkpoint $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/schema/raw_kaspa $ hdfs dfs -mkdir -p hdfs://localhost:9000/user/{YOURUSERNAME}/file/maxmind
Download Maxmind Database (GeoLite2-City) from here: https://www.maxmind.com/en/accounts/current/geoip/downloads
Put the file
into HadoopFS$ hdfs dfs -put /path/to/GeoLite2-City.mmdb /user/{YOURUSERNAME}/file/maxmind/