-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWATCH-3177 Fix invalid remittance calculation when an amendment is applied #4033
base: main
Are you sure you want to change the base?
Conversation
/retest |
2 similar comments
/retest |
/retest |
Verification steps : On the main branch
Also, I checked the billable usage message has hardware type=AWS/Cloud Switched to the PR branch and deployed the PR image
@mstead I think this is expected here because already 9 were sent(due to the bug in the main branch), and now 15, so 15-9=6 in the pending state? but need your confirmation here .
Now checked for the previous month's scenario
@mstead I think this is expected here, because the latest value 7 adjusted with the first event value which is 2 ?
Checked on the swatch-billable-usage-service and can't see the HMT in the message. |
Yes, that is correct.
Yes, this is also correct. |
/retest |
There are two failed tests in this PR due to the absence of the hardware_measurement_type column, which has been removed in this PR. I see that there is already a PR on the IQE side to handle the changes: commit link. However, due to an issue while releasing the tag in the iqe-rhsm plugin (v2024.12.16.0 tag), the CI job is unable to pull the latest image and fails with the same error. I am currently working on identifying the root cause of the failure. |
lgtm, but I would like to have another person approve the logic. |
3a34bf2
to
34aedfc
Compare
/retest |
1 similar comment
/retest |
These two tests have failed multiple times now in PR checks for this PR:
Is this an issue or is this expected behavior and the tests need to be updated? |
/retest |
@liwalker-rh I just wrote these and I think its some timing issues I need to fix. I'll work on an IQE fix. |
/retest |
2 similar comments
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in testing and the related IQE tests are working as expected. Will merge those tests once this has been merged.
/retest |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unapproving pending failing test investigation
/retest |
/retest |
34aedfc
to
4b114c9
Compare
When determining the current remittance total, the billable usage component no longer considers the hardware_measurement_type. As such, when TallySummary messages are created during the hourly tally, we must also exclude the measurement type when calculating the current total for a metric, as this value is used when determining remittance.
This patch removes the hardware measurement type from billable usage and remittance since it does not apply to remittance in general.
4b114c9
to
85a18a3
Compare
/retest |
1 similar comment
/retest |
Jira issue: SWATCH-3177
Description
This bug happens when the incoming events are in conflict (2 events with the same instance, metric and timestamp) and the hardware measurement type (HMT) is different. In the case of the reproducer events, the rhelemeter event does not specify the HMT which will get defaulted to PHYSICAL, not AWS. Since remittance uses the HMT to determine the current total remittance, it returns an incorrect value for the total when the usage for the second event is received.
This only happens when a conflict occurs and the value changes, since the measurements for the hour are replaced when tallied.
Resolution
When determining the current remittance total, the billable usage
component no longer considers the hardware_measurement_type. As
such, when TallySummary messages are created during the hourly tally,
we must also exclude the measurement type when calculating the
current total for a metric, as this value is used when determining
remittance.
This PR also removes the hardware measurement type from billable usage
and remittance since it does not apply to remittance in general.
Affected Components
Testing
IQE Test MR: IQE TEST
Steps
Deployment Tips
When testing locally, I deployed the following services. Please read the test steps carefully, since the first deployment should be done on the MAIN branch.
swatch-contracts
swatch-billable-usage
swatch-tally
Testing
First, reproduce the bug on the MAIN branch.
Create a cost-management event for a host instance that includes the hardware_type.
Perform an hourly tally for the org.
Verify 4 AWS vCPUs were remitted.
Create a rhelemeter style event for the SAME HOST (no cloud_provider or hardware type).
Run an hourly tally for the org.
Because the event should have been amended (4 - 4) + 8, the expected billable usage should have been and additional 4 vCPUs since we already billed for 4 (from the first event). HOWEVER, you'll notice that there were 8 additional vCPUs billed due to the PHYSICAL hardware measurement type. We have overbilled by 4 vCPUs.
Check out the branch with the bug fix and deploy. After deploying, check the billable_usage_remittance table to make sure that the hardware_measurement_type column was dropped.
Next verify that we will recover correctly from the 4 additional vCPUs that we previously billed for.
Perform an hourly tally for the org.
Since an additional 10 vCPUs were reported in the event we just sent, and we've previously overbilled for 4 vCPUs due to the bug, after the tally we should see that we billed for 6 vCPUs.
Next, attempt to reproduce the bug scenario WITH A DIFFERENT MONTH to ensure the bug is addressed when starting fresh.
Create a cost-management event for a host instance that includes the hardware_type.
Perform an hourly tally for the org.
Create a rhelemeter style event for the SAME HOST (no cloud_provider or hardware type).
Run an hourly tally for the org.
Check the remittance for 2024-07 to ensure that we did not overbill by 4 vCPUs. Since an amendment had occurred when the 2nd event was received, we should have only remitted an additional 4 vCPUs, for a total of 8 vCPUs for 2024-07.