Skip to content

johnnylarner/pandas-polars-pyspark

Repository files navigation

ppp

This project contains the code for the pandas-polars-pyspark (PPP) experiment.

Prerequisites

This project uses terraform to manage some aspects of its AWS resources. Please follow the Terraform install instructions.

Validate your installation by running:

terraform -help

In addition you'll also need AWS CLI installed locally. You can find the install docs here.

Validate your installation by running:

aws --version

Getting Started

To set up your local development environment, please run:

poetry install

Behind the scenes, this creates a virtual environment and installs ppp along with its dependencies into a new virtualenv. Whenever you run poetry run <command>, that <command> is actually run inside the virtualenv managed by poetry.

You can now import functions and classes from the module with import ppp.

Authenticating with AWS

To request credentials for our AWS user, please contact @johnnylarner. Once you have these, you can configure your credentials by running:

aws configure

Building the terraform stack

To update or build the terraform stack, run:

terraform init && terraform apply

This will prompt you for use input

Destroying the stack

To destory the stack, run:

terraform destroy

Note that it's not possible to delete our ECR repo without deleting all the images first. This can be acheived through the delete_images.sh. See instructions below about how to run that script.

Running CLI scripts

We have several UNIX shell scripts in the cli_scripts folder.

Please make sure you cd into the directory before running any scripts, as the terraform paths are hard coded.

You can push our app image via the deploy_image.sh script. You can then submit a batch job - defined in our terraform stack - via the submit_batch.sh script.

The delete_images.sh script is there to help clean up the terraform stack.

These scripts extract variables from our terraform stack. Any changes to the scripts should follow the same approach.

Docker

Currently we are building docker images build for linux/amd64. This means macOs and windows users can't use the images locally.

Testing

We use pytest as test framework. To execute the tests, please run

pytest tests

To run the tests with coverage information, please use

pytest tests --cov=src --cov-report=html --cov-report=term

and have a look at the htmlcov folder, after the tests are done.

Distribution Package

To build a distribution package (wheel), please use

python setup.py bdist_wheel

this will clean up the build folder and then run the bdist_wheel command.

Contributions

Before contributing, please set up the pre-commit hooks to reduce errors and ensure consistency

pip install -U pre-commit
pre-commit install

If you run into any issues, you can remove the hooks again with pre-commit uninstall.

Contact

James Richardson ([email protected])

License

© James Richardson

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •