GitHub - crazyn2/hadoop-zookeeper-hbase

Run Hadoop Cluster within Docker Containers

This project has been repaired several bugs from its original source and been tested stably in Ubuntu 20.04 x86_64.

install docker

3 Nodes Hadoop Cluster

1. clone github repository

git clone https://github.com/crazyn2/hadoop-zookeeper-hbase.git
cd hadoop-zookeeper-hbase

2. build docker image or pull re-build image

chmod +x build-image.sh
./build-image.sh

or

docker pull ctazyn/hadoop-hbase:2.3

ctazyn/hadoop-hbase:1.0 :ubuntu14.04 + hadoop2 + zookeeper3 + hbase1 + openjdk8
ctazyn/hadoop-hbase:2.0 :ubuntu18.04 + hadoop2 + zookeeper3 + hbase1 + openjdk8
ctazyn/hadoop-hbase:2.1 :ubuntu18.04 + hadoop3 + zookeeper3 + hbase1 + openjdk11(2.1 or later the openjdk is 11)
ctazyn/hadoop-hbase:2.2 :ubuntu20.04 + hadoop3 + zookeeper3 + hbase1 + mariadb + hive3 + openjdk11
ctazyn/hadoop-hbase:2.3 :hadoop3 + zookeeper3 + hbase1 + openjdk11(mariadb + hive3 just in hadoop-master container to simplify the image disk occupation) (recommanded)

3. create hadoop network

sudo docker network create --driver=bridge hadoop

4. start container

chmod +x ./start-container.sh
./start-container.sh

output:

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
root@hadoop-master:~#

start 3 containers with 1 master and 2 slaves
you will get into the /root directory of hadoop-master container

5. start hadoop

./start-hadoop.sh

6. run wordcount

./run-wordcount.sh

output

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker    1
Hadoop    1
Hello    2

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

do 1~3 like section A

2. rebuild docker image

sudo ./resize-cluster.sh 5

specify parameter > 1: 2, 3..
if the parameter is null, the default is 3
this script just rebuild hadoop image with different slaves file, which pecifies the name of all slave nodes

3. start container

sudo ./start-container.sh 5

use the same parameter as the step 2

4. run hadoop cluster

do 5~6 like section A

7. run hbase

/usr/local/hbase/bin/start-hbase.sh

Warning：please wait at least 3 min until the application launches successfully

8. start hbase shell

/usr/local/hbase/bin/hbase shell

stop docker cluster

chmod +x stop-docker.sh
./stop-docker.sh

start docker cluster after stopped the cluster

chmod +x start-docker.sh
./start-docker.sh

mapred --daemon start historyserver

Solve the problem

Complete state

root@hadoop-master:/usr/local/hadoop/logs# jps
2148 Jps
22 QuorumPeerMain
1832 ResourceManager
248 NameNode
476 SecondaryNameNode

Start ResourceManager manually

start-yarn.sh

Start DFS manually

start-dfs.sh

Development with VScode + Maven + Java11 + Docker

VScode plugins:Java Extension Pack(Microsoft), Docker(Microsoft), Remote Explorer(Microsoft)

1.maven build java project directory

mvn archetype:generate "-DgroupId=com.companyname.bank" "-DartifactId=consumerBanking" "-DarchetypeArtifactId=maven-archetype-quickstart" "-DinteractiveMode=false"

2.VScode connect hadoop-master of docker container by Docker and Remote Explorer plugins

click Remote Explorer plugin icon which is in the left extension volumn and right click the expected container "attach the container" chioce. Then please wait for a while until the VScode remote server applications is installed.By the way, if the Java Extension Pack plugin remote server isn't installed, you should finish it manually which automatically builds settting.json and launch.json in docker container.

3.Open folder in VScode

4.Build jar file in target directory

mvn package

5.Run jar file in hadoop

hadoop jar {filename}.jar {mainClassPath}

Example

hadoop jar consumerBanking-1.0-SNAPSHOT.jar com/companyname/bank/App

Run docker container witch is composed by hadoop-spark

container: ctazyn/hadoop-spark-hbase:latest(Ubuntu20.04+hadoop3.3+spark3)

1.Download container

docker pull ctazyn/hadoop-spark-hbase:latest

Then the similar shell scripts whose name is inserted into spark

Referece Blogs

Blog: Run Hadoop Cluster in Docker Update
博客: 基于Docker搭建Hadoop集群之升级版
博客: 基于docker快速搭建hbase集群

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
config		config
.gitignore		.gitignore
Docker.png		Docker.png
Dockerfile		Dockerfile
LICENSE		LICENSE
MavenJava.png		MavenJava.png
README.md		README.md
RemoteExplorer.png		RemoteExplorer.png
build-image.sh		build-image.sh
build.gradle		build.gradle
docker-compose.yaml		docker-compose.yaml
hadoop-cluster-docker.png		hadoop-cluster-docker.png
just-exe.sh		just-exe.sh
just.sh		just.sh
mysqlm.sh		mysqlm.sh
pom.xml		pom.xml
resize-cluster.sh		resize-cluster.sh
run.sh		run.sh
sources.list		sources.list
start-container.sh		start-container.sh
start-docker.sh		start-docker.sh
start-spark-container.sh		start-spark-container.sh
start-spark-docker.sh		start-spark-docker.sh
stop-docker.sh		stop-docker.sh
stop-spark-docker.sh		stop-spark-docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run Hadoop Cluster within Docker Containers

install docker

3 Nodes Hadoop Cluster

1. clone github repository

2. build docker image or pull re-build image

3. create hadoop network

4. start container

5. start hadoop

6. run wordcount

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

2. rebuild docker image

3. start container

4. run hadoop cluster

7. run hbase

8. start hbase shell

stop docker cluster

start docker cluster after stopped the cluster

Solve the problem

Complete state

Start ResourceManager manually

Start DFS manually

Development with VScode + Maven + Java11 + Docker

1.maven build java project directory

2.VScode connect hadoop-master of docker container by Docker and Remote Explorer plugins

3.Open folder in VScode

4.Build jar file in target directory

5.Run jar file in hadoop

Run docker container witch is composed by hadoop-spark

1.Download container

Referece Blogs

About

Releases

Packages

Languages

License

crazyn2/hadoop-zookeeper-hbase

Folders and files

Latest commit

History

Repository files navigation

Run Hadoop Cluster within Docker Containers

install docker

3 Nodes Hadoop Cluster

1. clone github repository

2. build docker image or pull re-build image

3. create hadoop network

4. start container

5. start hadoop

6. run wordcount

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

2. rebuild docker image

3. start container

4. run hadoop cluster

7. run hbase

8. start hbase shell

stop docker cluster

start docker cluster after stopped the cluster

Solve the problem

Complete state

Start ResourceManager manually

Start DFS manually

Development with VScode + Maven + Java11 + Docker

1.maven build java project directory

2.VScode connect hadoop-master of docker container by Docker and Remote Explorer plugins

3.Open folder in VScode

4.Build jar file in target directory

5.Run jar file in hadoop

Run docker container witch is composed by hadoop-spark

1.Download container

Referece Blogs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages