Hadoop docker project is used to build and start up a big data cluster in some docker containers. It is a convenient tool to try and test the big data components. Currently the components we support includes:
The big data components are optional while starting up the cluster.
- For the hadoop component, there are 3 nodes whose hostnames are
hadoop100
,hadoop101
andhadoop102
. - For the hive component, there are one node whose hostname is
hive
, and hive component depends on themysql
node. - For the spark component, there is one node whose hostname is
spark
as the spark driver.
The docker image depandency is as follows:
- docker, docker-compose
- make
- sed
- curl
git clone [email protected]:wecharyu/hadoop-docker.git
cd hadoop-docker
make build target=hadoop
The target
option is used to specify the component your cluster want, whose default value is hadoop
, the optional value include hadoop
and hive
.
- use the Makefile script:
make run target=hadoop [docker_cmd=docker|podman]
- restart existed containers
docker-compose start
You can easily change the component version for your requirements in the Makefile
file. Further more, if you want to test the non-community release version, i.e. components modified and compiled by yourself, you can place your compiled tarball in packages
directory, do not forget to keep the version number same as configuration in Makefile
.
- remove containers when stopping:
docker-compose down
- stop containers which can be restarted
docker-compose stop
docker exec -u root -it hadoop100 bash
You can use su
command to switch to other user in this container, for example you can use su hadoop
in hadoop container.
- HDFS UI: http://localhost:9870
- YARN UI: http://localhost:8088