This is my crop yield prediction project repository. It contains all the necessary information to reproduce the results of my paper on the subject as well as the explanation of the key techniques, ideas and results.
The main idea behine the research is to develop a gradient boosting model that would give yearly yield predictions for corn and soybean crops in 13 states of interest using geospatial data. The data was collected using Google Earth Enigne service as well as USDA statistical database and later transformed to tabular format so as to apply classical machine learning techniques.
The requirements to replicate the results:
- Knowledge of Python and machine learning
- Basic Knowledge of JavaScript
- Google Earth Engine developper account
The required packages are contained in the requirements.txt file and can be installed via pip file manager
In the researh I used the following datasets from Google Earth Engine:
- TIGER: US Census States 2018 for US states boundaries
- USDA NASS Cropland Data Layers for Crop Layers
- MOD13Q1.061 Terra Vegetation Indices 16-Day Global 250m for NDVI index
- MOD11A2.061 Terra Land Surface Temperature and Emissivity 8-Day Global 1km for surface temperature
- ERA5-Land Monthly Averaged - ECMWF Climate Reanalysis for precipitation data
The data was retrieved using Google Earth Engine code editor using JavaScrip code presented in the repository.
The code allows to select the states of interest:
Apply the crop mask:
Crop mask | Crop mask: closer |
---|---|
![]() |
![]() |
Sample the data within the boundaries of each state:
Plot the data for each year on a Google server and extract it in a .csv format. The obtained data is quite raw and can be preprocessed using any desirable software. The final datasets are available in data folder, where each dataset corresponds the a certain period (data_march, for instance, uses data only up to March for prediction). See additional details about data preparation in the paper