Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"calibration_year_final" not correctly taken into account in pdsi calculation #544

Open
jeremietellus opened this issue Mar 13, 2024 · 1 comment

Comments

@jeremietellus
Copy link

jeremietellus commented Mar 13, 2024

Code

import numpy as np
import pandas as pd
import xarray as xr

data_start_year=1990
data_end_year=2010

dates = pd.date_range(start=f'{data_start_year}-01-01', end=f'{data_end_year}-12-31', freq='MS')
lat = np.arange(10, 20, 1)
lon = np.arange(180, 190, 1) 


# Generating random data for precipitation and potential evapotranspiration (PET)
precips_data = np.random.rand(len(dates), len(lat), len(lon)) # Precipitation data
pet_data = np.random.rand(len(dates), len(lat), len(lon)) # Potential evapotranspiration data

# Creating the first dataset with precipitation and PET data
first_dataset = xr.Dataset(
    {
        "precips": (["time", "latitude", "longitude"], precips_data),
        "pet": (["time", "latitude", "longitude"], pet_data)
    },
    coords={
        "time": dates,
        "latitude": lat,
        "longitude": lon},
)

# Selecting the calibration year
calibration_year_final=2005

# Modifying data after the calibration year: values are doubled
chosen_date=pd.Timestamp(f"{calibration_year_final+1}-01-01")
second_dataset = first_dataset.where(first_dataset.time < chosen_date, first_dataset * 2)


def palmer_pdsi_wrapper(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final):
    pdsi, phdi, wplm, z, _ = palmer.pdsi(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final)
    return pdsi, phdi, wplm, z


import os
os.chdir("/home/jeremiesicard/Documents/main-project")
from thirdpart.submodules.climate_indices.src.climate_indices import palmer



# Calculating the Palmer Drought Severity Index (PDSI) for the first dataset
first_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                first_dataset.precips,
                first_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[first_dataset.precips.dtype] * 4,
            )

first_pdsi_dataset=first_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])


# Calculating the Palmer Drought Severity Index (PDSI) for the second dataset
second_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                second_dataset.precips,
                second_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[second_dataset.precips.dtype] * 4,
            )

second_pdsi_dataset=second_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])


chosen_date_2 = pd.Timestamp(f"{calibration_year_final}-12-31")

print(f"Is the first pdsi equal to the second pdsi for the period {data_start_year}-01-01 to {calibration_year_final}-12-31?\n",
       np.array_equal(
                first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2).values,
                second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2).values
                ))

print(f"The values that differ over the period {data_start_year}-01-01 to {calibration_year_final}-12-31 are as follows:\n",
      np.unique(first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2)-second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2), return_counts=True))

Describe the bug

The precipitation and PET values that follow the "calibration year final" have an influence on the PDSI values before the "calibration year final" in the calculation of the PDSI.

Expected behavior

In this example, considering:

  • a first xarray.Dataset ranging from 1990 to 2010 with a monthly frequency of precipitation and PET
  • a second xarray.Dataset where the precipitation and PET values are identical to the first xarray.Dataset from 1990 to 2005 (inclusive), then different from 2006 to 2010
  • a "calibration year final" in 2005

I expected to have identical PDSI values between the two xarray.Datasets from 1990 to 2005 (inclusive), then different from 2006 onwards. However, as you can see from the printouts, the values are also different for the period from 1990 to 2005. Is the "calibration year final" correctly accounted for?

Screenshots

Is the first pdsi equal to the second pdsi for the period 1990-01-01 to 2005-12-31?
False
The values that differ over the period 1990-01-01 to 2005-12-31 are as follows:
(array([-5.46034514, -2.34233531, -2.23779737, -1.39403927, -1.07894443,
-0.96372188, -0.75961582, -0.68137539, 0. , 2.02436987,
2.2568226 , 2.51596723, 2.8048687 , 3.12694393, 3.48600215]), array([ 1, 1, 1, 1, 1, 1, 1, 1, 19186,
1, 1, 1, 1, 1, 1]))

Desktop

@monocongo
Copy link
Owner

Thanks for this detailed bug report, @jeremietellus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants