Skip to content

Latest commit

 

History

History
86 lines (56 loc) · 3.58 KB

README.md

File metadata and controls

86 lines (56 loc) · 3.58 KB

Simple Dockerized Presto

This projects aims to make it easy to get started with Presto. It is based on Docker and Docker compose. Currently, the following features are supported:

Starting Presto

The following should be enough to bring up all required services:

docker-compose up

Varying the Number of Workers and Data Nodes

To change the number of Presto worker nodes or HDFS data nodes, use the --scale flag of docker-compose:

docker-compose up --scale datanode=3 --scale presto-worker=3

Building the Image Locally

Above command uses a pre-built docker image. If you want the image to be build locally, do the following instead:

docker-compose --file docker-compose-local.yml up

If you are behind a corporate firewall, you will have to configure Maven (which is used to build part of Presto) as follows before running above command:

export MAVEN_OPTS="-Dhttp.proxyHost=your.proxy.com -Dhttp.proxyPort=3128 -Dhttps.proxyHost=your.proxy.com -Dhttps.proxyPort=3128"

Uploading Data to HDFS

The data/ folder is mounted into the HDFS namenode container, from where you can upload it using the HDFS client in that container (docker-presto_presto_1 may have a different name on your machine; run docker ps to find out):

docker exec -it docker-presto_namenode_1 hadoop fs -mkdir /dataset
docker exec -it docker-presto_namenode_1 hadoop fs -put /data/file.parquet /dataset/
docker exec -it docker-presto_namenode_1 hadoop fs -ls /dataset

Running Queries

You can use the Presto CLI included in the Docker containers of this project (adapt container name if necessary):

docker exec -it docker-presto_presto_1 presto-cli --catalog hive --schema default

Alternatively, you can download the Presto CLI, rename it, make it executable, and run the following:

./presto-cli --server localhost:8080 --catalog hive --schema default

Creating an External Table

Suppose you have the following file test.json:

{"s": "hello world", "i": 42}

Upload it to /test/test.csv on HDFS as described above. Then run the following in the Presto CLI:

CREATE TABLE test (s VARCHAR, i INTEGER) WITH (EXTERNAL_LOCATION = 'hdfs://namenode/test/', FORMAT = 'JSON');

For external tables from S3, spin up this service in an EC2 instance, set up an instance profile for that instance, and use the s3a:// protocol instead of hdfs://.

Adminstrating the MySQL Databases

In case you need to make manual changes or want to inspect the MySQL databases, you can connect to it like this:

docker exec -it docker-presto_mysql_1 mysql -ppassword