Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWATCH-3177 Fix invalid remittance calculation when an amendment is applied #4033

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mstead
Copy link
Contributor

@mstead mstead commented Dec 10, 2024

Jira issue: SWATCH-3177

Description

This bug happens when the incoming events are in conflict (2 events with the same instance, metric and timestamp) and the hardware measurement type (HMT) is different. In the case of the reproducer events, the rhelemeter event does not specify the HMT which will get defaulted to PHYSICAL, not AWS. Since remittance uses the HMT to determine the current total remittance, it returns an incorrect value for the total when the usage for the second event is received.

This only happens when a conflict occurs and the value changes, since the measurements for the hour are replaced when tallied.

Resolution

When determining the current remittance total, the billable usage
component no longer considers the hardware_measurement_type. As
such, when TallySummary messages are created during the hourly tally,
we must also exclude the measurement type when calculating the
current total for a metric, as this value is used when determining
remittance.

This PR also removes the hardware measurement type from billable usage
and remittance since it does not apply to remittance in general.

Affected Components

  1. swatch-billable-usage
  • The currentTotal of TallySummary.TallyMeasurement are now calculated to not include the hardware_type.
  • The billable_usage_remittance table no longer has the hardware_measurement_type.
  1. All Swatch Billing Producers
  • Incoming BillableUsage messages no longer have the hardwareMeasurementType included.

Testing

IQE Test MR: IQE TEST

Steps

Deployment Tips

When testing locally, I deployed the following services. Please read the test steps carefully, since the first deployment should be done on the MAIN branch.

swatch-contracts

QUARKUS_MANAGEMENT_ENABLED=false SERVER_PORT=8001 ./gradlew :swatch-contracts:quarkusDev

swatch-billable-usage

SERVER_PORT=8002 QUARKUS_MANAGEMENT_PORT=9004 DEV_MODE=true ./gradlew :swatch-billable-usage:quarkusDev

swatch-tally

SPRING_PROFILES_ACTIVE=api,worker,kafka-queue DEV_MODE=true ./gradlew clean :bootRun

Testing

First, reproduce the bug on the MAIN branch.

Create a cost-management event for a host instance that includes the hardware_type.

http :9080/topics/platform.rhsm-subscriptions.service-instance-ingress Content-Type:application/vnd.kafka.json.v2+json Accept:application/vnd.kafka.v2+json --raw '{
        "records": [
          {
            "key": "3340851",
            "value": {
              "service_type": "RHEL System",
              "timestamp": "2024-06-03T14:00:00Z",
              "expiration": "2024-07-03T15:00:00Z",
              "display_name": "automation__cluster_testHostgge",
              "measurements": [
                {
                  "value": 4,
                  "uom": "vCPUs",
                  "metric_id": "vCPUs"
                }
              ],
              "product_ids": [
                "204"
              ],
              "cloud_provider": "AWS",
              "hardware_type": "Cloud",
              "sla": "Premium",
              "usage": "Production",
              "billing_provider": "aws",
              "billing_account_id": "testkfk",
              "product_tag": [
                "rhel-for-x86-els-payg-addon"
              ],
              "conversion": false,
              "event_source": "cost-management",
              "event_type": "vCPUs",
              "org_id": "3340851",
              "instance_id": "testHostgge"
            }
          }
        ]
      }
	  '

Perform an hourly tally for the org.

http POST ":8000/api/rhsm-subscriptions/v1/internal/tally/hourly?org=3340851" x-rh-swatch-psk:placeholder

Verify 4 AWS vCPUs were remitted.

select * from billable_usage_remittance where org_id='3340851';

Create a rhelemeter style event for the SAME HOST (no cloud_provider or hardware type).

http :9080/topics/platform.rhsm-subscriptions.service-instance-ingress Content-Type:application/vnd.kafka.json.v2+json Accept:application/vnd.kafka.v2+json --raw '{
        "records": [
          {
            "key": "3340851",
            "value": {
              "service_type": "RHEL System",
              "timestamp": "2024-06-03T14:00:00Z",
              "expiration": "2024-07-03T15:00:00Z",
              "display_name": "automation__cluster_testHostgge",
              "measurements": [
                {
                  "value": 8,
                  "uom": "vCPUs",
                  "metric_id": "vCPUs"
                }
              ],
              "product_ids": [
                "204"
              ],
              "sla": "Premium",
              "usage": "Production",
              "billing_provider": "aws",
              "billing_account_id": "testkfk",
              "product_tag": [
                "rhel-for-x86-els-payg-addon"
              ],
              "conversion": false,
              "event_source": "rhelemeter",
              "event_type": "vCPUs",
              "org_id": "3340851",
              "instance_id": "testHostgge"
            }
          }
        ]
      }
	  '

Run an hourly tally for the org.

http POST ":8000/api/rhsm-subscriptions/v1/internal/tally/hourly?org=3340851" x-rh-swatch-psk:placeholder

Because the event should have been amended (4 - 4) + 8, the expected billable usage should have been and additional 4 vCPUs since we already billed for 4 (from the first event). HOWEVER, you'll notice that there were 8 additional vCPUs billed due to the PHYSICAL hardware measurement type. We have overbilled by 4 vCPUs.

select * from billable_usage_remittance where org_id='3340851';

Check out the branch with the bug fix and deploy. After deploying, check the billable_usage_remittance table to make sure that the hardware_measurement_type column was dropped.

Next verify that we will recover correctly from the 4 additional vCPUs that we previously billed for.

http :9080/topics/platform.rhsm-subscriptions.service-instance-ingress Content-Type:application/vnd.kafka.json.v2+json Accept:application/vnd.kafka.v2+json --raw '{
        "records": [
          {
            "key": "3340851",
            "value": {
              "service_type": "RHEL System",
              "timestamp": "2024-06-03T15:00:00Z",
              "expiration": "2024-07-03T15:00:00Z",
              "display_name": "automation__cluster_testHostgge",
              "measurements": [
                {
                  "value": 10,
                  "uom": "vCPUs",
                  "metric_id": "vCPUs"
                }
              ],
              "product_ids": [
                "204"
              ],
              "cloud_provider": "AWS",
              "hardware_type": "Cloud",
              "sla": "Premium",
              "usage": "Production",
              "billing_provider": "aws",
              "billing_account_id": "testkfk",
              "product_tag": [
                "rhel-for-x86-els-payg-addon"
              ],
              "conversion": false,
              "event_source": "cost-management",
              "event_type": "vCPUs",
              "org_id": "3340851",
              "instance_id": "testHostgge"
            }
          }
        ]
      }
	  '

Perform an hourly tally for the org.

http POST ":8000/api/rhsm-subscriptions/v1/internal/tally/hourly?org=3340851" x-rh-swatch-psk:placeholder

Since an additional 10 vCPUs were reported in the event we just sent, and we've previously overbilled for 4 vCPUs due to the bug, after the tally we should see that we billed for 6 vCPUs.

Next, attempt to reproduce the bug scenario WITH A DIFFERENT MONTH to ensure the bug is addressed when starting fresh.

Create a cost-management event for a host instance that includes the hardware_type.

http :9080/topics/platform.rhsm-subscriptions.service-instance-ingress Content-Type:application/vnd.kafka.json.v2+json Accept:application/vnd.kafka.v2+json --raw '{
        "records": [
          {
            "key": "3340851",
            "value": {
              "service_type": "RHEL System",
              "timestamp": "2024-07-03T14:00:00Z",
              "expiration": "2024-08-03T15:00:00Z",
              "display_name": "automation__cluster_testHostgge",
              "measurements": [
                {
                  "value": 4,
                  "uom": "vCPUs",
                  "metric_id": "vCPUs"
                }
              ],
              "product_ids": [
                "204"
              ],
              "cloud_provider": "AWS",
              "hardware_type": "Cloud",
              "sla": "Premium",
              "usage": "Production",
              "billing_provider": "aws",
              "billing_account_id": "testkfk",
              "product_tag": [
                "rhel-for-x86-els-payg-addon"
              ],
              "conversion": false,
              "event_source": "cost-management",
              "event_type": "vCPUs",
              "org_id": "3340851",
              "instance_id": "testHostgge"
            }
          }
        ]
      }
	  '

Perform an hourly tally for the org.

http POST ":8000/api/rhsm-subscriptions/v1/internal/tally/hourly?org=3340851" x-rh-swatch-psk:placeholder

Create a rhelemeter style event for the SAME HOST (no cloud_provider or hardware type).

http :9080/topics/platform.rhsm-subscriptions.service-instance-ingress Content-Type:application/vnd.kafka.json.v2+json Accept:application/vnd.kafka.v2+json --raw '{
        "records": [
          {
            "key": "3340851",
            "value": {
              "service_type": "RHEL System",
              "timestamp": "2024-07-03T14:00:00Z",
              "expiration": "2024-08-03T15:00:00Z",
              "display_name": "automation__cluster_testHostgge",
              "measurements": [
                {
                  "value": 8,
                  "uom": "vCPUs",
                  "metric_id": "vCPUs"
                }
              ],
              "product_ids": [
                "204"
              ],
              "sla": "Premium",
              "usage": "Production",
              "billing_provider": "aws",
              "billing_account_id": "testkfk",
              "product_tag": [
                "rhel-for-x86-els-payg-addon"
              ],
              "conversion": false,
              "event_source": "rhelemeter",
              "event_type": "vCPUs",
              "org_id": "3340851",
              "instance_id": "testHostgge"
            }
          }
        ]
      }
	  '

Run an hourly tally for the org.

http POST ":8000/api/rhsm-subscriptions/v1/internal/tally/hourly?org=3340851" x-rh-swatch-psk:placeholder

Check the remittance for 2024-07 to ensure that we did not overbill by 4 vCPUs. Since an amendment had occurred when the 2nd event was received, we should have only remitted an additional 4 vCPUs, for a total of 8 vCPUs for 2024-07.

@liwalker-rh
Copy link

/retest

2 similar comments
@Aurobinda55
Copy link

/retest

@kflahert
Copy link
Contributor

/retest

@Aurobinda55 Aurobinda55 self-requested a review December 13, 2024 14:07
@Aurobinda55
Copy link

Verification steps :

On the main branch

  • Deployed EE,
  • Created a rhel-payg-addon event with event source cost management, value =2,instance-uuid='rhel_instance',billing_account_id='test-123'
  • sync hourly tally
  • Again, created another rhel-payg-addon event with event source=rhelemeter , value =7,instance-uuid='rhel_instance',billing_account_id='test-123', timestamp=same as above
  • sync hourly tally
  • conflicts happened(verified)
  • Checked the remittance record and it showed two records,
  Out[27]: 
[{'orgId': '3340851',
  'productId': 'rhel-for-x86-els-payg-addon',
  'metricId': 'vCPUs',
  'billingProvider': 'aws',
  'billingAccountId': 'test-123',
  'remittedValue': 2.0,
  'accumulationPeriod': '2024-12',
  'remittanceDate': '2024-12-13T13:23:55.771941Z',
  'remittanceStatus': 'succeeded',
  'remittanceErrorCode': 'null'},
 {'orgId': '3340851',
  'productId': 'rhel-for-x86-els-payg-addon',
  'metricId': 'vCPUs',
  'billingProvider': 'aws',
  'billingAccountId': 'test-123',
  'remittedValue': 7.0,
  'accumulationPeriod': '2024-12',
  'remittanceDate': '2024-12-13T13:25:43.53135Z',
  'remittanceStatus': 'pending',
  'remittanceErrorCode': 'null'}]

Also, I checked the billable usage message has hardware type=AWS/Cloud

Switched to the PR branch and deployed the PR image

  • Created one more events with event source =cost-management, timestampe=same as previous event, value=15,instance-id=same as previous
  • sync tally hourly
  • Conflict happened
  • Fetched the remittance records
  • Verified hardware measurement type removed from the existing record and new record
  • There are two records in the remittance table, 1st record with remitted value 6, status Pending
  • 2nd record with value 9 status succeeded
[{'orgId': '3340851',
  'productId': 'rhel-for-x86-els-payg-addon',
  'metricId': 'vCPUs',
  'billingProvider': 'aws',
  'billingAccountId': 'test-123',
  'remittedValue': 6.0,
  'accumulationPeriod': '2024-12',
  'remittanceDate': '2024-12-13T13:49:05.750887Z',
  'remittanceStatus': 'pending',
  'remittanceErrorCode': 'null'},
 {'orgId': '3340851',
  'productId': 'rhel-for-x86-els-payg-addon',
  'metricId': 'vCPUs',
  'billingProvider': 'aws',
  'billingAccountId': 'test-123',
  'remittedValue': 9.0,
  'accumulationPeriod': '2024-12',
  'remittanceDate': '2024-12-13T13:25:43.53135Z',
  'remittanceStatus': 'succeeded',
  'remittanceErrorCode': 'null'}]

@mstead I think this is expected here because already 9 were sent(due to the bug in the main branch), and now 15, so 15-9=6 in the pending state? but need your confirmation here .

  • Again, Created one more event with event source=rhelemeter, timestampe=same as previous event, instanc-id=same as previous, value=2
  • sync tally hourly
  • Fetch remittance record
  • Now, I see one record status =succeeded, with a value of 15 and the latest value of 2 ignored
[{'orgId': '3340851',
  'productId': 'rhel-for-x86-els-payg-addon',
  'metricId': 'vCPUs',
  'billingProvider': 'aws',
  'billingAccountId': 'test-123',
  'remittedValue': 15.0,
  'accumulationPeriod': '2024-12',
  'remittanceDate': '2024-12-13T13:49:05.750887Z',
  'remittanceStatus': 'succeeded',
  'remittanceErrorCode': 'null'}]

Now checked for the previous month's scenario

  • Created two events having event source= cost-management, rhelmeter, same timestamp, same instance id, value 2 and 7 for each event
  • sync hourly tally
  • Now I see, two records in the remittance table 1st record value=5 pending 2nd record value=2 status=succeeded
{
       "accumulationPeriod": "2024-11",
       "billingAccountId": "test-123",
       "billingProvider": "aws",
       "metricId": "vCPUs",
       "orgId": "3340851",
       "productId": "rhel-for-x86-els-payg-addon",
       "remittanceDate": "2024-12-13T14:32:09.255147Z",
       "remittanceErrorCode": "null",
       "remittanceStatus": "pending",
       "remittedValue": 5.0
   },
   {
       "accumulationPeriod": "2024-11",
       "billingAccountId": "test-123",
       "billingProvider": "aws",
       "metricId": "vCPUs",
       "orgId": "3340851",
       "productId": "rhel-for-x86-els-payg-addon",
       "remittanceDate": "2024-12-13T14:28:10.887495Z",
       "remittanceErrorCode": "null",
       "remittanceStatus": "succeeded",
       "remittedValue": 2.0
   },

@mstead I think this is expected here, because the latest value 7 adjusted with the first event value which is 2 ?
Also checked, the remittance records have no HMT ,

account_number          | 
org_id                  | 3340851
product_id              | rhel-for-x86-els-payg-addon
metric_id               | vCPUs
accumulation_period     | 2024-11
sla                     | Premium
usage                   | Production
billing_provider        | aws
billing_account_id      | test-123
remitted_pending_value  | 2
remittance_pending_date | 2024-12-13 14:28:10.887495+00
retry_after             | 
tally_id                | 8adf1a76-ae29-4a46-817a-61d7bd034ff5
uuid                    | 3faadcbb-8789-4e64-8c80-1c7b64b8a4be
billed_on               | 2024-12-13 14:32:09.412855+00
error_code              | 
status                  | succeeded
 

Checked on the swatch-billable-usage-service and can't see the HMT in the message.

@mstead
Copy link
Contributor Author

mstead commented Dec 13, 2024

@mstead I think this is expected here because already 9 were sent(due to the bug in the main branch), and now 15, so 15-9=6 in the pending state? but need your confirmation here .

Yes, that is correct.

@mstead I think this is expected here, because the latest value 7 adjusted with the first event value which is 2 ?

Yes, this is also correct.

@lindseyburnett lindseyburnett self-assigned this Dec 16, 2024
@Aurobinda55
Copy link

/retest

@Aurobinda55
Copy link

Aurobinda55 commented Dec 16, 2024

There are two failed tests in this PR due to the absence of the hardware_measurement_type column, which has been removed in this PR. I see that there is already a PR on the IQE side to handle the changes: commit link. However, due to an issue while releasing the tag in the iqe-rhsm plugin (v2024.12.16.0 tag), the CI job is unable to pull the latest image and fails with the same error. I am currently working on identifying the root cause of the failure.

@kahowell kahowell added Dev Pull requests that need developer review QE Pull request should be approved by QE before merge labels Dec 16, 2024
@wottop
Copy link
Contributor

wottop commented Dec 16, 2024

lgtm, but I would like to have another person approve the logic.

@mstead mstead force-pushed the mstead/SWATCH-3177-amendment-remittance-issue branch from 3a34bf2 to 34aedfc Compare December 16, 2024 20:10
@liwalker-rh
Copy link

/retest

1 similar comment
@liwalker-rh
Copy link

/retest

@liwalker-rh
Copy link

@liwalker-rh
Copy link

/retest

@kflahert
Copy link
Contributor

@liwalker-rh I just wrote these and I think its some timing issues I need to fix. I'll work on an IQE fix.

@liwalker-rh
Copy link

/retest

2 similar comments
@liwalker-rh
Copy link

/retest

@liwalker-rh
Copy link

/retest

Copy link

@liwalker-rh liwalker-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in testing and the related IQE tests are working as expected. Will merge those tests once this has been merged.

@liwalker-rh liwalker-rh added QE/approved Pull requests that have been approved by all assigned QEs and removed QE Pull request should be approved by QE before merge labels Jan 6, 2025
@mstead
Copy link
Contributor Author

mstead commented Jan 7, 2025

/retest

@lindseyburnett lindseyburnett added Dev/approved Pull requests that have been approved by all assigned developers and removed Dev Pull requests that need developer review labels Jan 7, 2025
@liwalker-rh
Copy link

/retest

1 similar comment
@liwalker-rh
Copy link

/retest

@lindseyburnett lindseyburnett added QE Pull request should be approved by QE before merge Dev Pull requests that need developer review and removed Dev/approved Pull requests that have been approved by all assigned developers QE/approved Pull requests that have been approved by all assigned QEs labels Jan 8, 2025
Copy link
Contributor

@lindseyburnett lindseyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unapproving pending failing test investigation

@liwalker-rh
Copy link

/retest

@liwalker-rh liwalker-rh self-requested a review January 8, 2025 17:34
@liwalker-rh
Copy link

/retest

@Sgitario Sgitario force-pushed the mstead/SWATCH-3177-amendment-remittance-issue branch from 34aedfc to 4b114c9 Compare January 9, 2025 14:32
When determining the current remittance total, the billable usage
component no longer considers the hardware_measurement_type. As
such, when TallySummary messages are created during the hourly tally,
we must also exclude the measurement type when calculating the
current total for a metric, as this value is used when determining
remittance.
This patch removes the hardware measurement type from billable usage
and remittance since it does not apply to remittance in general.
@mstead mstead force-pushed the mstead/SWATCH-3177-amendment-remittance-issue branch from 4b114c9 to 85a18a3 Compare January 10, 2025 15:40
@mstead
Copy link
Contributor Author

mstead commented Jan 10, 2025

/retest

1 similar comment
@liwalker-rh
Copy link

/retest

@liwalker-rh liwalker-rh added QE/approved Pull requests that have been approved by all assigned QEs and removed QE Pull request should be approved by QE before merge labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev Pull requests that need developer review QE/approved Pull requests that have been approved by all assigned QEs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants