Some job counters are prone to overflow/wraparound #143

abought · 2024-03-13T23:26:11Z

Summary

Certain job metrics are not being stored correctly in the database. This makes it more difficult to investigate system performance questions like "how many people would be affected if we put a limit on number of SNPs submitted".

There may be alternate methods to grab the data eventually from job logs, but not as convenient. It's not an urgent fix, but definitely a "gotcha".

Tracking in case this surprises anyone else!

Description/ root cause

A counter value like genotypes is calculated by multiplying two large ints, like "genotypes * samples". The result is bigger than the maximum java value for that type (2147483647 for signed ints)
Java represents this as a much smaller number
The correct value is shown in UI / job logs (which are stored separately as a pre-constructed text string), but the wrong value is stored in the DB table.

This affects both the initial calculation, and the incCounters method (which accepts an int).

Example

A recent job submitted 2.5M SNPs with 15k samples. (3.75 e 10) The Java max value for an int is ~2.1B. The resulting value is wrapped to ~3e6. The correct # of SNPs and samples are shown in the job report (where they are represented separately), but the values in the report do not match the numbers stored in the database (which are multiplied together).

In practice, this is usually not obvious until one needs to query to find big jobs. A subtler sign of an issue is that in TIS, ~10% of "genotypes" counters are < 0.

select count(*) from counters where name='genotypes' and value <0;

Note: the MySQL table definition would already support bigger numbers (counters.value = bigint column type). The issue appears to be in java.

The text was updated successfully, but these errors were encountered:

abought added the bug Something isn't working label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some job counters are prone to overflow/wraparound #143

Some job counters are prone to overflow/wraparound #143

abought commented Mar 13, 2024 •

edited

Loading

Some job counters are prone to overflow/wraparound #143

Some job counters are prone to overflow/wraparound #143

Comments

abought commented Mar 13, 2024 • edited Loading

Summary

Description/ root cause

Example

abought commented Mar 13, 2024 •

edited

Loading