TCRM V3 seems to have a low cp bias #80

KayShelton · 2020-03-30T15:45:38Z

I have been trying to get v3.0.1 on a local (UK) linux virtual machine, and most things have been successful, except the results are inconsistent with v2 results.

I have repeated an evaluation simulation (500 sims of 59 years), using the same IBTrACS historical data and same other set-up parameters (as best as I can tell) as a colleague had done previously, but my results have a significant low pressure bias.

The minPressure plot created from ExecuteEvaluate is attached for V3 and V2 of TCRM. As far as I can tell, these simulations should be near identical (there are some very subtle differences in the historic data which I cannot explain, but for the most part they are near identical). Although I would not expect identical results in the simulations, I did not expect them to be so vastly different.

Investigating the individual track generation for V3, the _stepPressureChange function in TrackGenerator.py seems to give some very large pressure fluctuations, e.g. for one track with a timestep of 6 hours, the pressure changes were: -0.1, -9.0, 17.1, -9.7, -11.3, -9.1, 4.4, 3.6, 4.1. For another TC the new self.dp, in _stepPressureChange, is given as -2.13 where the sigma is only 0.633, while another has a new self.dp of -3.13, with sigma of 0.56. These do seem slightly excessive pressure changes.

Have you seen similar results?
I first tested with the IBTrACS V4, but then re-ran to repeat an old experiment when the results were not as expected. Any help or insights would be very much appreciated.
V2 results:

V3 results:

KayShelton · 2020-03-31T11:48:37Z

UPDATE:
I have managed to establish my colleague actually only used post 1981 data for their evaluation (I don't have the .ini files unfortunately). I have re-run TCRM V3 using only this data and I am now satisfied that I have removed all inconsistencies between the previous simulation using TCRM V2 and the current one using V3. However, the differences between the V2 and V3 min pressure still remain, although less prominent.

V2 results left, V3 results right

:

wcarthur · 2020-04-01T05:59:40Z

Hi @KayShelton this is a great bit of analysis, and I am personally doing similar for the full Aus basin presently, including comparisons to potential intensity. We'll flag any modifications that may arise.

If there's anything you want me to follow up on, please let me know

KayShelton · 2020-04-01T14:05:42Z

Further investigations:
I have determined that the data generated from the historical dataset during the DataProcessing in V3 is identical to that from using V2.
The all_cell_cdf files are different, however. Comparisons for a single cell (Cell 0):
Pressure:

Speed:

The same KDE type is used (Gaussian), and as far as I can tell (through comparison of the log files), the number of observations used to derive the cdf for each cell is identical for V2 and V3. Hence the data going into the CDF calculation should be identical.

I have tested if the minSamplesCell has any impact, but it does not seem to, likely because the sample size is small.

Working through the StatInterface code, I think I have boiled the issue down to be related to the change to using the KDE from statsmodels, rather than the C code. I am a little concerned to see that the new code seems to have quite a strong bias. I can show this is consistent across the parameters with pressure and speed in the graphs below, taking a pressure or speed value that are close-ish to the 50th percentile, so 1000 hPa for pressure, and 11 kph (could be m/s) for speed. The TCRM V2 results are orange, the V3 results are blue, the y axis shows the percentile at which this pressure or speed value occurs. I am presuming the step pattern is due to the row changes with latitude in the grid, and the within-step variation showing the longitudinal variation at a given latitude (step).

Based on these results, I am not comfortable using the Gaussian KDE in TCRMV3. Do you think there is something amiss with this KDE, it seems possibly too strong a smoother or be placing too much weight on individual outlier values at the extremes of the parameter space. See below for the lower extreme of the pressure CDF for one cell, note there is one observation at 939 hPa, the next is at 992 hPa:

Do you have a suggestion for an alternative KDE type that might be more appropriate? I can do testing, but I imagine you may have much greater experience and could at least narrow down the choices. Again, any help would be great.

wcarthur · 2020-04-02T04:35:51Z

When we changed to statsmodels, we had to also change the bandwidth calculation functions that were attached to them. I did do a bit of exploratory analysis, and the bandwith functions in the old C code was different to statsmodels.nonparametric.bandwidths.bw_silverman() by a factor of sqrt(2*pi) (for univariate data, and the gaussian kernel). I actually suspect the gaussian kernel implementation in the C code was incorrect.

KayShelton · 2020-04-03T12:19:16Z

So I did further investigating, and the choice of KDE doesn't appear to make a great deal of difference except to the initiation day. Comparison plot for the genesis pressure:

KDE type does have an impact on genesis day:

I did also investigate the presence of the 939hPa genesis pressure for a TC. Turns out there is an issue in the IBTrACS code (https://groups.google.com/forum/#!topic/ibtracs-qa/on8IlePsqaA), which means data from Nadi is missing in the WMO fields. IBTrACS folks will resolve it and update before the end of Apr.

So all in all, I don't think there is an issue with the new KDE methods, but it is worth noting that they will give different results to those in TCRMV2, and they are a lot less forgiving to outlier data.

I have re-run the evaluation track generation for to 500 simulations for 1981-2016 (using IBTrACS v3.10, after manually editing the WMO TC records to include obs from RSMC Nadi). The resulting minimum pressure maps compared to the previous with V3 (previous V3 left, new V3 right):

The new maps show a little more definition in the pattern, but are still a long way from V2. Also, looking at the new Q-Q and PDF plots of min pressure, the distribution is more spread out than V2, and much more so than the obs. The PDF tail at lower min P values is significantly fatter (V2 left, new V3 right):

wcarthur · 2020-04-05T21:43:56Z

Thanks for the detective work @KayShelton

With the kernel comparison, can you reproduce for a different cell number? Curious to see if there is any spatial variability to that result.

KayShelton · 2020-04-06T09:24:26Z

There is not overly much spatial variation, but that is primarily due to the small sample size across the domain. With only about 115 TCs across the entire domain in the post 1981 period, almost all TCs are used for each grid cell. The largest differences seem to be found for the higher cell values, which I believe would be the southeast corner. This is consistent across all four parameters, but most easily shown with speed:

KayShelton · 2020-04-22T09:37:24Z

Thinking again about the poor representation of the historical minimum pressure distribution by the synthetic dataset, in particular the fat tail at the lower end of the distribution. I think the problem with this lies in the hard cut off in the minimum allowed central pressure in the _singleTrack function in TrackGenerator. At present, this is hard coded as a lower limit of min observed min cp minus 4 x standard deviation. I think too many tracks are reaching this limit, and the limit is extremely low. The figure below shows what this limit looks like for my larger domain, using TC tracks back to 1959:

I wonder then if this limit should also be stochastically derived?

KayShelton · 2020-04-22T16:31:35Z

I have tried a possible alternative for the minCP threshold:
`# if (pressure[i] < (pstat.min[cellNum] -

4. * pstat.sig[cellNum])):

            ##### added to allow some greater variability
            ##### sets threshold to between 0 and 2 SD
            sigFactor = (uniform()*1.)
            if (pressure[i] < (pstat.min[cellNum] -
                               sigFactor * pstat.sig[cellNum])):
                log.debug('Recalculting pressure as extremely low. sigFactor: %s',sigFactor)`

After a little trial and error, I though that limiting the minCP to 1SD below the lowest observed was probably reasonable, as that is still going to be pretty low.

The resulting minCP plot (note I have only done 5 simulations of 59 years for quicker processing):

I think this looks a lot better. I have also added in a plot to do samples (as done for e.g. track density), and I think they look pretty decent too:

One issue, however is the relatively low track densities in comparison to observations:

I think this may also be feeding into the minPressureDist, where we have a quite high proportion of the distribution with minCP higher than observed (around 1000hPa). This may be due to a relatively large number of short-lived weak systems (hence the low track densities), or it may be related to the possible initEnvPressure bug I have noted in #87.

It would be interesting to investigate the distribution of track duration in the synthetic dataset compared to obs.

KayShelton · 2020-04-24T08:28:45Z

So, interestingly, resolving the bug in #87 does appear to have also fixed the slight bias in the minCP histogram toward higher than observed pressures. Below is the results for 500 simulations of 59 years:

Note there is now a bit of a lump at low pressures, likely due to me using 1 standard deviations below the lowest pressure observed as by lower bound on the pressure.

KayShelton · 2020-05-18T16:34:36Z

I have been following up on the low track densities which are evident in the evaluation simulations for the South Pacific, but potentially more problematic in regions with very few TCs, such as the Arabian Sea.

To allow me to investigate this, I have copied and modified some the the scripts in the Evaluate module to look into the TC longevity (track length), as well as a few other things while trouble shooting.

I have identified that TCRM does seem to have a very strong bias towards very short-lived TCs (12 hours), as shown in the TC longevity comparison below (using the 500 South Pacific evaluation simulations from above).

In attempting to resolve this issue, I have also determined that TCRM may be getting the minimum central pressure distribution (shown in the post above) close to right but for the wrong reason: there are many TCs that only live for 12 hours hence have high minimum central pressures.

My resolutions to this issue are something of a compromise, but I wanted to check they are physically/statistically sound:

I have added a requirement for a TC track to be over water at some point (i.e. no land only tracks)
I have added a requirement for some intensification after genesis, i.e. a TC cannot simply be initialised at a pressure and then weaken
I have modified the minimum pressure difference (env - central) to include a caveat for weak systems, whereby if the central pressure is >= 995hPa, the pressure difference can be as little as 1hPa for any TC age
I have added a TC-age dependent "longevity factor", whereby if a TC would be invalid (due to pressure difference) at any age, the change in central pressure is recalculated to give the TC a chance at a longer life. This longevity factor is weighted to short-lived TCs, such that for age >= 60hrs there is a 50% chance of a longer life, increasing by 5% for every 6 hours less than that, so that at age=6hrs, the TC has a 90% chance of a longer life.

@wcarthur do these modifications sound sensible and valid?

I will re-run the South Pacific simulations to present the results of these changes. I have been testing in the NIO for another project, and this area has very limited intensity information, so may not be representative of other basins.

KayShelton · 2020-05-19T10:34:04Z

For a limited test of 5 simulations of 59 years for the South Pacific, applying the 4 conditions above to the TrackGenerator module results in a significant improvement in TC longevity, and a slight improvement in track density, but a degradation in the minimum central pressure distribution.

Comparison between original 500 sims of 59 years (left panels) and 5 sims of 59 years with updates applied, for TC longevity distribution, minimum pressure distribution and track density samples:

KayShelton · 2020-05-19T13:38:13Z

Musing on the poor central pressure distributions, I wonder if the step pressure change formulation is entirely appropriate. Using the logistic distribution as the population distribution to sample from is OK for speed and bearing, as the QQ plots for these rates show a near anti-symmetric distribution. However, it may not necessarily be appropriate for the pressure rate distribution, which is not anti-symmetric, as shown in the examples below for the South Pacific (left) and NIO (right):

I wonder, if instead, the pressure rate distributions should be separated into those for deepening and filling, and the likelihood of each of those states evaluated and used to derive the the sign of the pressure change, followed by the magnitude, based on the appropriate distribution.

wcarthur · 2020-07-17T04:09:55Z

I have been following up on the low track densities which are evident in the evaluation simulations for the South Pacific, but potentially more problematic in regions with very few TCs, such as the Arabian Sea.

To allow me to investigate this, I have copied and modified some the the scripts in the Evaluate module to look into the TC longevity (track length), as well as a few other things while trouble shooting.

I have identified that TCRM does seem to have a very strong bias towards very short-lived TCs (12 hours), as shown in the TC longevity comparison below (using the 500 South Pacific evaluation simulations from above).

In attempting to resolve this issue, I have also determined that TCRM may be getting the minimum central pressure distribution (shown in the post above) close to right but for the wrong reason: there are many TCs that only live for 12 hours hence have high minimum central pressures.

My resolutions to this issue are something of a compromise, but I wanted to check they are physically/statistically sound:

I have added a requirement for a TC track to be over water at some point (i.e. no land only tracks)

I have added a requirement for some intensification after genesis, i.e. a TC cannot simply be initialised at a pressure and then weaken

I have modified the minimum pressure difference (env - central) to include a caveat for weak systems, whereby if the central pressure is >= 995hPa, the pressure difference can be as little as 1hPa for any TC age

I have added a TC-age dependent "longevity factor", whereby if a TC would be invalid (due to pressure difference) at any age, the change in central pressure is recalculated to give the TC a chance at a longer life. This longevity factor is weighted to short-lived TCs, such that for age >= 60hrs there is a 50% chance of a longer life, increasing by 5% for every 6 hours less than that, so that at age=6hrs, the TC has a 90% chance of a longer life.

@wcarthur do these modifications sound sensible and valid?

I will re-run the South Pacific simulations to present the results of these changes. I have been testing in the NIO for another project, and this area has very limited intensity information, so may not be representative of other basins.

Hi @KayShelton Can you provide a diff of the code that captures teh changes you've made. It will allow me to test and integrate the changes quickly into the code base.

KayShelton · 2020-07-17T08:35:46Z

@wcarthur as requested a diff of TrackGenerator.py. Note I have added a fair few print statements too, to allow me to figure out what was going on. Also added is the code for the longevityDistribution (renamed to a .txt). Note this is lifted straight from pressureDistribution meaning I have not necessarily changed variable names appropriately.
diff_TrackGenerator.txt
longevityDistribution.py.txt

wcarthur self-assigned this Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCRM V3 seems to have a low cp bias #80

TCRM V3 seems to have a low cp bias #80

KayShelton commented Mar 30, 2020 •

edited

Loading

KayShelton commented Mar 31, 2020

wcarthur commented Apr 1, 2020

KayShelton commented Apr 1, 2020

wcarthur commented Apr 2, 2020

KayShelton commented Apr 3, 2020 •

edited

Loading

wcarthur commented Apr 5, 2020

KayShelton commented Apr 6, 2020

KayShelton commented Apr 22, 2020

KayShelton commented Apr 22, 2020

KayShelton commented Apr 24, 2020 •

edited

Loading

KayShelton commented May 18, 2020 •

edited

Loading

KayShelton commented May 19, 2020

KayShelton commented May 19, 2020 •

edited

Loading

wcarthur commented Jul 17, 2020

KayShelton commented Jul 17, 2020

TCRM V3 seems to have a low cp bias #80

TCRM V3 seems to have a low cp bias #80

Comments

KayShelton commented Mar 30, 2020 • edited Loading

KayShelton commented Mar 31, 2020

wcarthur commented Apr 1, 2020

KayShelton commented Apr 1, 2020

wcarthur commented Apr 2, 2020

KayShelton commented Apr 3, 2020 • edited Loading

wcarthur commented Apr 5, 2020

KayShelton commented Apr 6, 2020

KayShelton commented Apr 22, 2020

KayShelton commented Apr 22, 2020

4. * pstat.sig[cellNum])):

KayShelton commented Apr 24, 2020 • edited Loading

KayShelton commented May 18, 2020 • edited Loading

KayShelton commented May 19, 2020

KayShelton commented May 19, 2020 • edited Loading

wcarthur commented Jul 17, 2020

KayShelton commented Jul 17, 2020

KayShelton commented Mar 30, 2020 •

edited

Loading

KayShelton commented Apr 3, 2020 •

edited

Loading

KayShelton commented Apr 24, 2020 •

edited

Loading

KayShelton commented May 18, 2020 •

edited

Loading

KayShelton commented May 19, 2020 •

edited

Loading