This project contains the code for the pandas-polars-pyspark (PPP) experiment.
This project uses terraform to manage some aspects of its AWS resources. Please follow the Terraform install instructions.
Validate your installation by running:
terraform -help
In addition you'll also need AWS CLI installed locally. You can find the install docs here.
Validate your installation by running:
aws --version
To set up your local development environment, please run:
poetry install
Behind the scenes, this creates a virtual environment and installs ppp
along with its dependencies into a new virtualenv. Whenever you run poetry run <command>
, that <command>
is actually run inside the virtualenv managed by poetry.
You can now import functions and classes from the module with import ppp
.
To request credentials for our AWS user, please contact @johnnylarner. Once you have these, you can configure your credentials by running:
aws configure
To update or build the terraform stack, run:
terraform init && terraform apply
This will prompt you for use input
To destory the stack, run:
terraform destroy
Note that it's not possible to delete our ECR repo without deleting all the images first. This can be acheived through the delete_images.sh
. See instructions below about how to run that script.
We have several UNIX shell scripts in the cli_scripts
folder.
Please make sure you cd
into the directory before running any scripts, as the terraform paths are hard coded.
You can push our app image via the deploy_image.sh
script. You can then submit a batch job - defined in our terraform stack - via the submit_batch.sh
script.
The delete_images.sh
script is there to help clean up the terraform stack.
These scripts extract variables from our terraform
stack. Any changes to the scripts should follow the same approach.
Currently we are building docker images build for linux/amd64
. This means macOs and windows users can't use the images locally.
We use pytest
as test framework. To execute the tests, please run
pytest tests
To run the tests with coverage information, please use
pytest tests --cov=src --cov-report=html --cov-report=term
and have a look at the htmlcov
folder, after the tests are done.
To build a distribution package (wheel), please use
python setup.py bdist_wheel
this will clean up the build folder and then run the bdist_wheel
command.
Before contributing, please set up the pre-commit hooks to reduce errors and ensure consistency
pip install -U pre-commit
pre-commit install
If you run into any issues, you can remove the hooks again with pre-commit uninstall
.
James Richardson ([email protected])
© James Richardson