From 59006b7b408367cb4ca3b1861df1b4626c26ed24 Mon Sep 17 00:00:00 2001 From: Xinqi Liu <40779025+cynthia-liu@users.noreply.github.com> Date: Thu, 24 Jan 2019 14:28:09 +0800 Subject: [PATCH] Docker image in doc site (#1117) * update * Update index.md * Update index.md * Update README.md --- README.md | 151 ++++++++++++++++++++++++++++++++++++++++++++ docs/docs/index.md | 154 +++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 299 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index f67374796d..bd0a6d712b 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,12 @@ In addition, Analytics Zoo also provides a rich set of analytics and AI support - [Reference use cases](#reference-use-cases): a collection of end-to-end *reference use cases* (e.g., anomaly detection, sentiment analysis, fraud detection, image augmentation, object detection, variational autoencoder, etc.) +- [Docker images and builders](#docker-images-and-builders) + - [Analytics-Zoo in Docker](#analytics-zoo-in-docker) + - [How to build it](#how-to-build-it) + - [How to use the image](#how-to-use-the-image) + - [Notice](#notice) + ## _Distributed TensorFlow and Keras on Spark/BigDL_ To make it easy to build and productionize the deep learning applications for Big Data, Analytics Zoo provides a unified analytics + AI platform that seamlessly unites Spark, TensorFlow, Keras and BigDL programs into an integrated pipeline (as illustrated below), which can then transparently run on a large-scale Hadoop/Spark clusters for distributed training and inference. (Please see more details [here](https://analytics-zoo.github.io/master/#ProgrammingGuide/tensorflow/)). @@ -301,3 +307,148 @@ Using *Analytics Zoo Image Classification API* (including a set of pretrained de ## _Reference use cases_ Analytics Zoo provides a collection of end-to-end reference use cases, including *time series anomaly detection*, *sentiment analysis*, *fraud detection*, *image similarity*, etc. (See more details [here](https://analytics-zoo.github.io/master/#ProgrammingGuide/usercases-overview/)) + +## _Docker images and builders_ + +### _Analytics-Zoo in Docker_ + +**By default, the Analytics-Zoo image has installed below packages:** +- git +- maven +- Oracle jdk 1.8.0_152 (in /opt/jdk1.8.0_152) +- python 2.7.6 +- pip +- numpy +- scipy +- pandas +- scikit-learn +- matplotlib +- seaborn +- jupyter +- wordcloud +- moviepy +- requests +- tensorflow_ +- spark-${SPARK_VERSION} (in /opt/work/spark-${SPARK_VERSION}) +- Analytics-Zoo distribution (in /opt/work/analytics-zoo-${ANALYTICS_ZOO_VERSION}) +- Analytics-Zoo source code (in /opt/work/analytics-zoo) + +**The work dir for Analytics-Zoo is /opt/work.** +- download-analytics-zoo.sh is used for downloading Analytics-Zoo distributions. +- start-notebook.sh is used for starting the jupyter notebook. You can specify the environment settings and spark settings to start a specified jupyter notebook. +- analytics-Zoo-${ANALYTICS_ZOO_VERSION} is the Analytics-Zoo home of Analytics-Zoo distribution. +- analytics-zoo-SPARK_x.x-x.x.x-dist.zip is the zip file of Analytics-Zoo distribution. +- spark-${SPARK_VERSION} is the Spark home. +- analytics-zoo is cloned from https://github.com/intel-analytics/analytics-zoo, contains apps, examples using analytics-zoo. + +### _How to build it_ + +**By default, you can build a Analytics-Zoo:default image with latest nightly-build Analytics-Zoo distributions:** + +```bash +sudo docker build --rm -t intelanalytics/analytics-zoo:default . +``` + +**If you need http and https proxy to build the image:** +```bash +sudo docker build \ + --build-arg http_proxy=http://your-proxy-host:your-proxy-port \ + --build-arg https_proxy=https://your-proxy-host:your-proxy-port \ + --rm -t intelanalytics/analytics-zoo:default . +``` + +**You can also specify the ANALYTICS_ZOO_VERSION and SPARK_VERSION to build a specific Analytics-Zoo image:** +```bash +sudo docker build \ + --build-arg http_proxy=http://your-proxy-host:your-proxy-port \ + --build-arg https_proxy=https://your-proxy-host:your-proxy-port \ + --build-arg ANALYTICS_ZOO_VERSION=0.3.0 \ + --build-arg BIGDL_VERSION=0.6.0 \ + --build-arg SPARK_VERSION=2.3.1 \ + --rm -t intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 . +``` + +### _How to use the image_ +**To start a notebook directly with a specified port(e.g. 12345). You can view the notebook on http://[host-ip]:12345** +```bash +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +``` + +**If you need http and https proxy in your environment:** +```bash +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:default +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:default +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +``` + +**You can also start the container first** +```bash +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default bash +``` + +**In the container, after setting proxy and ports, you can start the Notebook by:** +```bash +/opt/work/start-notebook.sh +``` + +### _Notice_ +**If you need nightly build version of Analytics-Zoo, please pull the image form Dockerhub with:** +```bash +sudo docker pull intelanalytics/analytics-zoo:latest +``` + +**Please follow the readme in each app folder to test the jupyter notebooks !!!** + +**With 0.3+ version of Anaytics-Zoo Docker image, you can specify the runtime conf of spark** +```bash +sudo docker run -itd --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="1234qwer" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + -e RUNTIME_DRIVER_CORES_ENV=4 \ + -e RUNTIME_DRIVER_MEMORY=20g \ + -e RUNTIME_EXECUTOR_CORES=4 \ + -e RUNTIME_EXECUTOR_MEMORY=20g \ + -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \ + intelanalytics/analytics-zoo:latest +``` diff --git a/docs/docs/index.md b/docs/docs/index.md index a7f335a22b..ebc6b33c9e 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -56,6 +56,12 @@ In addition, Analytics Zoo also provides a rich set of analytics and AI support - [Reference use cases](#reference-use-cases): a collection of end-to-end *reference use cases* (e.g., anomaly detection, sentiment analysis, fraud detection, image augmentation, object detection, variational autoencoder, etc.) +- [Docker images and builders](#docker-images-and-builders) + - [Analytics-Zoo in Docker](#analytics-zoo-in-docker) + - [How to build it](#how-to-build-it) + - [How to use the image](#how-to-use-the-image) + - [Notice](#notice) + ## _Distributed TensorFlow and Keras on Spark/BigDL_ To make it easy to build and productionize the deep learning applications for Big Data, Analytics Zoo provides a unified analytics + AI platform that seamlessly unites Spark, TensorFlow, Keras and BigDL programs into an integrated pipeline (as illustrated below), which can then transparently run on a large-scale Hadoop/Spark clusters for distributed training and inference. (Please see more details [here](https://analytics-zoo.github.io/master/#ProgrammingGuide/tensorflow/)). @@ -303,20 +309,156 @@ Using *Analytics Zoo Image Classification API* (including a set of pretrained de ## _Reference use cases_ Analytics Zoo provides a collection of end-to-end reference use cases, including *time series anomaly detection*, *sentiment analysis*, *fraud detection*, *image similarity*, etc. (See more details [here](https://analytics-zoo.github.io/master/#ProgrammingGuide/usercases-overview/)) +## _Docker images and builders_ + +### _Analytics-Zoo in Docker_ + +**By default, the Analytics-Zoo image has installed below packages:** +- git +- maven +- Oracle jdk 1.8.0_152 (in /opt/jdk1.8.0_152) +- python 2.7.6 +- pip +- numpy +- scipy +- pandas +- scikit-learn +- matplotlib +- seaborn +- jupyter +- wordcloud +- moviepy +- requests +- tensorflow_ +- spark-${SPARK_VERSION} (in /opt/work/spark-${SPARK_VERSION}) +- Analytics-Zoo distribution (in /opt/work/analytics-zoo-${ANALYTICS_ZOO_VERSION}) +- Analytics-Zoo source code (in /opt/work/analytics-zoo) + +**The work dir for Analytics-Zoo is /opt/work.** +- download-analytics-zoo.sh is used for downloading Analytics-Zoo distributions. +- start-notebook.sh is used for starting the jupyter notebook. You can specify the environment settings and spark settings to start a specified jupyter notebook. +- analytics-Zoo-${ANALYTICS_ZOO_VERSION} is the Analytics-Zoo home of Analytics-Zoo distribution. +- analytics-zoo-SPARK_x.x-x.x.x-dist.zip is the zip file of Analytics-Zoo distribution. +- spark-${SPARK_VERSION} is the Spark home. +- analytics-zoo is cloned from https://github.com/intel-analytics/analytics-zoo, contains apps, examples using analytics-zoo. + +### _How to build it_ + +**By default, you can build a Analytics-Zoo:default image with latest nightly-build Analytics-Zoo distributions:** + +```bash +sudo docker build --rm -t intelanalytics/analytics-zoo:default . +``` +**If you need http and https proxy to build the image:** +```bash +sudo docker build \ + --build-arg http_proxy=http://your-proxy-host:your-proxy-port \ + --build-arg https_proxy=https://your-proxy-host:your-proxy-port \ + --rm -t intelanalytics/analytics-zoo:default . +``` +**You can also specify the ANALYTICS_ZOO_VERSION and SPARK_VERSION to build a specific Analytics-Zoo image:** +```bash +sudo docker build \ + --build-arg http_proxy=http://your-proxy-host:your-proxy-port \ + --build-arg https_proxy=https://your-proxy-host:your-proxy-port \ + --build-arg ANALYTICS_ZOO_VERSION=0.3.0 \ + --build-arg BIGDL_VERSION=0.6.0 \ + --build-arg SPARK_VERSION=2.3.1 \ + --rm -t intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 . +``` +### _How to use the image_ +**To start a notebook directly with a specified port(e.g. 12345). You can view the notebook on http://[host-ip]:12345** +```bash +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default + +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default + +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 + +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +``` +**If you need http and https proxy in your environment:** +```bash +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:default + +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:default + +sudo docker run -it --rm -p 12345:12345 \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 + +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + intelanalytics/analytics-zoo:0.3.0-bigdl_0.6.0-spark_2.3.1 +``` +**You can also start the container first** +```bash +sudo docker run -it --rm --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="your-token" \ + intelanalytics/analytics-zoo:default bash +``` +**In the container, after setting proxy and ports, you can start the Notebook by:** +```bash +/opt/work/start-notebook.sh +``` +### _Notice_ +**If you need nightly build version of Analytics-Zoo, please pull the image form Dockerhub with:** +```bash +sudo docker pull intelanalytics/analytics-zoo:latest +``` - - - - - - +**Please follow the readme in each app folder to test the jupyter notebooks !!!** + +**With 0.3+ version of Anaytics-Zoo Docker image, you can specify the runtime conf of spark** +```bash +sudo docker run -itd --net=host \ + -e NotebookPort=12345 \ + -e NotebookToken="1234qwer" \ + -e http_proxy=http://your-proxy-host:your-proxy-port \ + -e https_proxy=https://your-proxy-host:your-proxy-port \ + -e RUNTIME_DRIVER_CORES_ENV=4 \ + -e RUNTIME_DRIVER_MEMORY=20g \ + -e RUNTIME_EXECUTOR_CORES=4 \ + -e RUNTIME_EXECUTOR_MEMORY=20g \ + -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \ + intelanalytics/analytics-zoo:latest +```