Using Jupyter based docker, you can use Jupyter Notebook with PredictionIO environment. It helps you with your exploratory data analysis (EDA).
First of all, start Jupyter container with PredictionIO environment:
docker-compose -f docker-compose.jupyter.yml \
-f pgsql/docker-compose.base.yml \
-f pgsql/docker-compose.meta.yml \
-f pgsql/docker-compose.event.yml \
-f pgsql/docker-compose.model.yml \
up
The above command prints a token to the console as below.
pio_1 | http://(3aaf67361022 or 127.0.0.1):8888/?token=e87a634b4ab7e2c8bcd86aea9def3eb48183c043eac86f3e
Open http://127.0.0.1:8888/
, type the token, and then open a new terminal in Jupyter from New
pulldown button.
Clone a template using Git:
cd templates/
git clone https://github.com/apache/predictionio-template-recommender.git
cd predictionio-template-recommender/
Replace a name with MyApp1
.
sed -i "s/INVALID_APP_NAME/MyApp1/" engine.json
Using pio command, register a new application as MyApp1
.
pio app new MyApp1
This command prints an access key as below.
[INFO] [Pio$] Access Key: bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm
Set it to an environment variable ACCESS_KEY
.
ACCESS_KEY=bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm
Download trainging data and import them to PredictionIO Event server.
curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
python data/import_eventserver.py --access_key $ACCESS_KEY
Build your template by the following command:
pio build --verbose
To create a model, run:
pio train
Clone a template using Git:
cd templates/
git clone https://github.com/jpioug/predictionio-template-iris.git
predictionio-template-iris/
Using pio command, register a new application as IrisApp
.
pio app new --access-key IRIS_TOKEN IrisApp
Download trainging data and import them to PredictionIO Event server.
python data/import_eventserver.py
Build your template by the following command:
pio build --verbose
To do data analysis, open templates/predictionio-template-iris/eda.ipynb
on Jupyter.
You need to clear the following environment variables in the terminal before executing pio train
.
unset PYSPARK_PYTHON
unset PYSPARK_DRIVER_PYTHON
unset PYSPARK_DRIVER_PYTHON_OPTS
To create a model, run:
pio train --main-py-file train.py