Skip to content

Commit

Permalink
Merge branch 'readme-fix' of https://github.com/ParsaLab/cloudsuite i…
Browse files Browse the repository at this point in the history
…nto readme-fix
  • Loading branch information
aledaglis committed Jan 16, 2016
2 parents 1ba457b + 8d1a7e1 commit b0767a3
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 40 deletions.
37 changes: 18 additions & 19 deletions benchmarks/data-analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,46 +12,46 @@ Supported tags and their respective `Dockerfile` links:

- [`Base`][basedocker]: This image contains the hadoop base which is needed for both master and slave images.
- [`Master`][masterdocker]: This image contains the main benchmark (hadoop master node, and mahout).
- [`Slave`][slavedocker]: This image contains the hadoop slave image.
- [`Slave`][slavedocker]: This image contains the hadoop slave image.
- [`Data`][datasetdocker]: This image contains the dataset used by the benchmark.

These images are automatically built using the mentioned Dockerfiles available on `cloudsuite/benchmarks/data-analytics/` [GitHub repo][repo].
These images are automatically built using the mentioned Dockerfiles available on `ParsaLab/cloudsuite` [GitHub repo][repo].

## Starting the volume image ##
This benchmark uses a Wikipedia dataset of ~30GB. We prepared a dataset image for training dataset, to download it once, and use it to run the benchmark. You can pull this image from Docker Hub.

$ docker pull cloudsuite/dataanalytics/dataset
$ docker pull cloudsuite/data-analytics:dataset

The following command will start the volume image, making the data available for other docker images on the host:

$ docker create --name data cloudsuite/dataanalytics/dataset
$ docker create --name data cloudsuite/data-analytics:dataset

## Starting the Master ##
To start the master you first have to `pull` the master image.

$ docker pull cloudsuite/dataanalytics/master
$ docker pull cloudsuite/data-analytics:master

Then, run the benchmark with the following command:

$ docker run -d -t --dns 127.0.0.1 -P --name master -h master.cloudsuite.com --volumes-from data cloudsuite/dataanalytics/master
$ docker run -d -t --dns 127.0.0.1 -P --name master -h master.cloudsuite.com --volumes-from data cloudsuite/data-analytics:master


## Starting the Slaves ##
If you want to have a single-node cluster, please skip this step.

To have more than one node, you need to start the slave containers. In order to do that, you first need to `pull` the slave image.

$ docker pull cloudsuite/dataanalytics/slave
$ docker pull cloudsuite/data-analytics:slave

To connect the slave containers to the master, you need the master IP.

$ FIRST_IP=$(docker inspect --format="{{.NetworkSettings.IPAddress}}" master)

Then, run as many slave containers as you want:

$ docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.cloudsuite.com -e JOIN_IP=$FIRST_IP cloudsuite/dataanalytics/slave
$ docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.cloudsuite.com -e JOIN_IP=$FIRST_IP cloudsuite/data-analytics:slave

Where `$i` is the slave number, you should start with 1 (i.e., slave1, slave1.cloudsuite.com, slave2, slave2.cloudsuite.com, ...).
Where `$i` is the slave number, you should start with 1 (i.e., slave1, slave1.cloudsuite.com, slave2, slave2.cloudsuite.com, ...).


## Running the benchmark ##
Expand All @@ -64,16 +64,15 @@ Then, run the benchmark with the following command:

$ ./run.sh

It asks you to enter the number of slaves, if you have a single-node cluster, please enter 0.
It asks you to enter the number of slaves, if you have a single-node cluster, please enter 0.
After entering the slave number, it prepares hadoop, downloads the dataset (it takes a lot of time to download this dataset) and runs the benchmark. After the benchmark finishes, the model will be available in HDFS, under the wikipediamodel directory.

[basedocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Base Dockerfile"
[masterdocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Master Dockerfile"
[slavedocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Slave Dockerfile"
[basedocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/master/Dockerfile "Base Dockerfile"
[masterdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/master/Dockerfile "Master Dockerfile"
[slavedocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/slave/Dockerfile "Slave Dockerfile"
[datasetdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/dataset/Dockerfile "Dataset Dockerfile"

[datasetdocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/dataset/Dockerfile "Dataset Dockerfile"

[repo]: https://github.com/ParsaLab/cloudsuite/tree/master/benchmarks/data-analytics "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/dataanalytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/dataanalytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/dataanalytics.svg "Go to DockerHub Page"
[repo]: https://github.com/ParsaLab/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/data-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-analytics.svg "Go to DockerHub Page"
33 changes: 18 additions & 15 deletions benchmarks/graph-analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ Supported tags and their respective `Dockerfile` links:
- [`spark-worker`][sparkworkerdocker] This builds an image for the Spark worker node. You may spawn several workers.
- [`spark-client`][sparkclientdocker] This builds an image with the Spark client node. The client is used to start the benchmark.

These images are automatically built using the mentioned Dockerfiles available on [`CloudSuite-EPFL/GraphAnalytics`][repo] and [`CloudSuite-EPFL/spark-base`][sparkrepo].
These images are automatically built using the mentioned Dockerfiles available on [`ParsaLab/cloudsuite`][repo].

### Starting the volume images ###

The first step is to create the volume images that contain the binaries and the dataset of the Graph Analytics benchmark. First `pull` the volume images, using the following command:

$ docker pull cloudsuite/GraphAnalytics:data
$ docker pull cloudsuite/GraphAnalytics:benchmark
$ docker pull cloudsuite/graph-analytics:data
$ docker pull cloudsuite/graph-analytics:benchmark

The following command will start the volume images, making both the data and the binaries available for other docker images on the host:

$ docker create --name data cloudsuite/GraphAnalytics:data
$ docker create --name bench cloudsuite/GraphAnalytics:benchmark
$ docker create --name data cloudsuite/graph-analytics:data
$ docker create --name bench cloudsuite/graph-analytics:benchmark

### Starting the master node ###

Expand Down Expand Up @@ -71,13 +71,16 @@ To run the benchmark from the interactive container, use the following command:

$ bash /benchmark/graph_analytics/run_benchmark.sh

[benchmarkdocker]: https://github.com/CloudSuite-EPFL/GraphAnalytics/blob/master/benchmarks/Dockerfile "Benchmark volume Dockerfile"
[datadocker]: https://github.com/CloudSuite-EPFL/GraphAnalytics/blob/master/data/Dockerfile "Data volume Dockerfile"
[sparkmasterdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-client/Dockerfile "Spark Client Dockerfile"
[repo]: https://github.com/CloudSuite-EPFL/GraphAnalytics "Graph Analytics GitHub Repo"
[sparkrepo]: https://github.com/CloudSuite-EPFL/spark-base "Spark Base GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/graphanalytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graphanalytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graphanalytics.svg "Go to DockerHub Page"
[benchmarkdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/graph-analytics/benchmark/Dockerfile "Benchmark volume Dockerfile"
[datadocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/graph-analytics/data/Dockerfile "Data volume Dockerfile"
[sparkmasterdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-client/Dockerfile "Spark Client Dockerfile"
[repo]: https://github.com/ParsaLab/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/graph-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graph-analytics.svg "Go to DockerHub Page"

[serverdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"

[clientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"
21 changes: 15 additions & 6 deletions benchmarks/spark-base/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
Spark Base Image for Cloudsuite
==========

[![Pulls on DockerHub][dhpulls]][dhrepo]
[![Stars on DockerHub][dhstars]][dhrepo]

This repository contains the docker image with a base Spark image for the CloudSuite workloads.

## Building the images ##
Expand All @@ -11,7 +15,7 @@ Supported tags and their respective `Dockerfile` links:
- [`spark-worker`][sparkworkerdocker] This builds an image with the Spark worker node. You may spawn clusters of several workers.
- [`spark-client`][sparkclientdocker] This builds an image with the Spark client node. The client is used to start the benchmark.

These images are automatically built using the mentioned Dockerfiles available on [`CloudSuite-EPFL/spark-base`][sparkrepo].
These images are automatically built using the mentioned Dockerfiles available on [`ParsaLab/cloudsuite`][repo].

### Starting the volume images ###

Expand All @@ -21,7 +25,7 @@ The `data` container contains the dataset that is necessary for the benchmark to

The `bench` container hosts the Java Spark binaries and scripts necessary to run the benchmark. The client `Entrypoint` script looks for a folder with the same name as the command line argument passed to the `docker run` command and runs the `run_benchmark.sh` script in that folder.

Assuming all the volume images are pulled, the following command will start the volume images, making both the data and the binaries avaliable for other docker images in the host:
Assuming all the volume images are pulled, the following command will start the volume images, making both the data and the binaries available for other docker images in the host:

$ docker create --name data [data-volume-image-tag]
$ docker create --name bench [binary-volume-image-tag]
Expand Down Expand Up @@ -64,7 +68,12 @@ To run the benchmark from the interactive container, use the following command:

$ bash /benchmark/[benchmark-name]/run_benchmark.sh

[sparkmasterdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-client/Dockerfile "Spark Client Dockerfile"
[sparkrepo]: https://github.com/CloudSuite-EPFL/spark-base "Spark Base GitHub Repo"
[sparkmasterdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-client/Dockerfile "Spark Client Dockerfile"

[repo]: https://github.com/ParsaLab/cloudsuite/ "GitHub Repo"

[dhrepo]: https://hub.docker.com/r/cloudsuite/spark-base/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/spark-base.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/spark-base.svg "Go to DockerHub Page"

0 comments on commit b0767a3

Please sign in to comment.