Skip to content

pltoledo/pyspark_notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark Notebook

This repository builds on top of Jupyter's PySpark Notebook Docker image, adding tools to make the use of Spark even more pratical for both new and experient users.

This image is aimed for local and interactive development of PySpark applications, but it also suits those who need to work with data on the cloud, as it sets up all the configuration needed to connect to Amazon S3 file service. That being said, the user must set the AWS-related variables in a .env file to access this functionality.

Dependecies

  • Docker >= 20.10.8
  • Docker-Compose >= 1.29.2

Although previous versions were not tested, it is posible that they work.

Quick Start

All you have to do is to properly set up environment variables and the python packages your project depends on, through the .env and requirements.txt files, respectively. The project contains templates of these files that you should edit before trying to start the Docker container.

After that, just run the following command in the folder that contains the files:

docker compose up -d

A jupyter lab application should be available at localhost:8888.

Author

Created by Pedro Toledo. Feel free to contact me!

Twitter Badge Linkedin Badge Gmail Badge

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published