Based on Cookiecutter Data Science, this Data Science project template aims to provide a more opinionated and firm framework, especially tailored towards beginners.
The following should get you set up quickly. For more details, consult the documentation.
- Python>=3.6
- Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
or
$ conda config --add channels conda-forge
$ conda install cookiecutter
cookiecutter https://github.com/waveFrontSet/cookiecutter-data-science
make create_environment
After activating the conda
environment, you are all set up. Here are the next
steps:
- Define how to obtain the raw data of your project and update the
data/raw
target in theMakefile
accordingly. - Define generic processing and clean up transformations in
{{ cookiecutter.module_name }}/data/generic_processing.py
to produce interim data. - Define project specific transformations to obtain the final data set in
{{ cookiecutter.module_name }}/features/build_features.py
. - Edit
{{ cookiecutter.module_name }}/models/model_config.py
to decide what models you want to build and what the target value of the prediction will be. Issuingmake train
will automatically split your dataset into a train and a test set and then fit the models on the train set. - Edit
{{ cookiecutter.module_name }}/models/metric_config.py
to decide what metrics you want to use to evaluate the model performance. Issuingmake evaluate
will evaluate the models using the defined metrics on the test set.
pip install -r requirements.txt
pytest tests