Skip to content

This Data Science project template aims to provide a more opinionated and firm framework, especially tailored towards beginners.

License

Notifications You must be signed in to change notification settings

waveFrontSet/grip-on-data-science

 
 

Repository files navigation

GriP on Data Science

Based on Cookiecutter Data Science, this Data Science project template aims to provide a more opinionated and firm framework, especially tailored towards beginners.

Quickstart

The following should get you set up quickly. For more details, consult the documentation.

Requirements to use the cookiecutter template:


  • Python>=3.6
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter https://github.com/waveFrontSet/cookiecutter-data-science

Create a new conda environment:


make create_environment

Overview over the next steps


After activating the conda environment, you are all set up. Here are the next steps:

  • Define how to obtain the raw data of your project and update the data/raw target in the Makefile accordingly.
  • Define generic processing and clean up transformations in {{ cookiecutter.module_name }}/data/generic_processing.py to produce interim data.
  • Define project specific transformations to obtain the final data set in {{ cookiecutter.module_name }}/features/build_features.py.
  • Edit {{ cookiecutter.module_name }}/models/model_config.py to decide what models you want to build and what the target value of the prediction will be. Issuing make train will automatically split your dataset into a train and a test set and then fit the models on the train set.
  • Edit {{ cookiecutter.module_name }}/models/metric_config.py to decide what metrics you want to use to evaluate the model performance. Issuing make evaluate will evaluate the models using the defined metrics on the test set.

Installing development requirements


pip install -r requirements.txt

Running the tests


pytest tests

About

This Data Science project template aims to provide a more opinionated and firm framework, especially tailored towards beginners.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 58.1%
  • Makefile 28.9%
  • Batchfile 13.0%