This repository provides a machine learning pipeline designed to estimate mean and low reference flows for Brazilian river stretches. It includes scripts for data collection, preprocessing, model training, and evaluation.
This section outlines how the dataset was generated. The steps below provide details about each part of the workflow.
-
Description: Data was collected using the Google Earth Engine Python API to extract hydrological and environmental metrics for Brazilian river stretches.
-
File:
src/data_treatment/gee_data_extract.py
-
Description: The raw data was processed using topological information from the Brazilian Hydrography Ottocodified (BHO) to generate features. Ran in the following order:
-
1. Structure Flow data:
src/data_treatment/org_flow.py
-
2. Aggregate all input data:
src/data_treatment/agg_att.py
-
3. Aggregated attributes to catchment accumulated:
src/data_treatment/acc_att.py
-
4. Structure All the data to be used by the ML models:
src/data_treatment/to_ml.py
-
Description: Six ML models were processed. A K-fold CV was used at the gauging sites, and the all gauging data was used for all ungauged sites, for all models.
-
File:
src/process_modelig/model_run.py
-
Description: The trained model was evaluated, and performance metrics were saved.
-
1. Evaluation of averaged ensemble combination:
src/process_post/ens_eval.py
-
2. Processing of the best ensemble combination to all data:
src/process_post/ens_run.py
-
3. Uncertainty estimation:
src/process_post/unc_run.py
-
4. Final dataset production:
src/process_post/data_gen.py
Clone the Repository:
bash git clone https://github.com/barbedorafael/ml_pipeline.git cd ml_pipeline
Install Dependencies:
Install required libraries with:
bash pip install -r requirements.txt
The dataset generated by this pipeline will be publicly available on Zenodo. [Link pending]. The dataset includes:
- Raw input data collected via the pipeline.
- Processed features used in model training.
- Output predictions and uncertainty estimates.
- Python 3.10+
- Google Earth Engine Python API
- Additional Python libraries (see
requirements.txt
)
Suggestions, bug reports, and contributions are welcome! Open an issue or submit a pull request to improve the workflow.
This project is licensed under the MIT License. See the LICENSE
file for details.