Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MB-6628] Use EagerPreload to improve performance in some key spots #5880

Merged
merged 6 commits into from
Feb 11, 2021

Conversation

reggieriser
Copy link
Contributor

@reggieriser reggieriser commented Feb 10, 2021

Description

To optimize performance, this PR migrates some code from using Pop's Eager method for loading associated objects/records to the Pop 5.x EagerPreload method (see Wiki page). I focused on three key spots noted below. For each, I note the change in the number of queries starting with the baseline E2E data as well as some (non-scientific) durations on my local machine. The delta between the number of queries (and possibly the durations) would likely get larger with more data since more data does not necessarily lead to more queries in the new code. In some experimental load testing, we saw fetch-mto-updates take dramatically longer with around 200 available-to-prime moves (rather than the 5 or so in the E2E data).

  • fetch-mto-updates via the prime-api-client
    • old: 496 queries in about 1100 ms
    • new: 19 queries in about 140 ms
  • TOO queue
    • old: 402 queries in about 900 ms
    • new: 31 queries in about 125 ms
  • TIO queue
    • old: 116 queries in about 300 ms
    • new: 25 queries in about 95 ms

Reviewer Notes

I ran into another Pop issue (noted in the comments) when loading multiple third-level associations from the same table. To work around this, I did have to do some separate Load calls in a couple of places which caused the query count to be a little larger than it would be if everything used EagerPreload as I wanted. I'll file a Pop issue (and hopefully a failing unit test) soon (update: Pop issue filed and Wiki page updated).

While inspecting the XHR traffic, I also noticed we seem to be querying all the records twice in both the TOO and TIO queues when first going to those pages. I'll ping the appropriate team about that (update: already on TTV's radar).

Setup

  • make db_dev_e2e_populate
  • make server_run and make client_run
  • prime-api-client --insecure fetch-mto-updates
  • For the TOO queue, log into the office UI and check it out with the TOO role.
  • For the TIO queue, log into the office UI and check it out with the TIO role.

Code Review Verification Steps

  • If the change is risky, it has been tested in experimental before merging.
  • Code follows the guidelines for Logging
  • The requirements listed in Querying the Database Safely have been satisfied.
  • Request review from a member of a different team.
  • Have the Jira acceptance criteria been met for this change?

References

@robot-mymove
Copy link

robot-mymove commented Feb 10, 2021

Warnings
⚠️ Please add the JIRA issue key to the PR title (e.g. MB-123)

Generated by 🚫 dangerJS against b02a01c

@reggieriser reggieriser changed the title Use EagerPreload to improve performance in some key spots [MB-6628] Use EagerPreload to improve performance in some key spots Feb 11, 2021
@@ -56,7 +56,7 @@ type Order struct {
Grade *string `json:"grade" db:"grade"`
Entitlement *Entitlement `belongs_to:"entitlements"`
EntitlementID *uuid.UUID `json:"entitlement_id" db:"entitlement_id"`
OriginDutyStation *DutyStation `belongs_to:"duty_stations"`
OriginDutyStation *DutyStation `belongs_to:"duty_stations" fk_id:"origin_duty_station_id"`
Copy link
Contributor Author

@reggieriser reggieriser Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the fk_id struct tags here to workaround a Pop issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting it! 🥳

Comment on lines -82 to -89
if err != nil {
switch err {
case sql.ErrNoRows:
return []models.Move{}, 0, services.NotFoundError{}
default:
return []models.Move{}, 0, err
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this appears to be a leftover. The err is checked before this, and checked again after the next Pop call below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the if statement check on line 9 doesn't look for ErrNoRows to return a NotFoundError and was wondering if it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, which if statement are you looking at? Line 9 is in the import section, so I'm guessing that's just a typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have miss read the line numbers. I believe I meant the one on line 32 as it doesn't return the NotFoundError like the one you deleted.

	if err != nil {
		return []models.Move{}, 0, err
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. That query is just getting the office user's associated transportation office's GBLOC to use as a filter in the next query for the TOO queue data. It should always return a record unless something really funky is going on because all the parts of that association are required in the DB, so not specifically looking for ErrNoRows is probably OK. That seems different from the big TOO queue query which might return no rows as part of normal operation because there doesn't happen to be anything available in that GBLOC's TOO queue.

@reggieriser reggieriser marked this pull request as ready for review February 11, 2021 00:21
// OriginDutyStation is a pointer.
// There appears to be a bug in Pop for EagerPreload when you have two or more eager paths with 3+ levels
// where the first 2 levels match. For example:
// "Orders.OriginDutyStation.Address" and "Orders.OriginDutyStation.TransportationOffice"
Copy link
Contributor

@shimonacarvalho shimonacarvalho Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could provide an endpoint to GetDutyStation and avoid the 3+ level nesting in general. DutyStation addresses don't change too much so it's not a huge deal not to have all the data in the fetchMTOUpdate. We could just return name and id. Not necessary change for this pr but wondered what you thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting bug, I think we have a few other places that such things happen so it's good to have documented. That said having a get duty station end point might make sense given they won't change much and it seems likely the prime will have a list of them as well. Totally agree that this isn't something we need for this PR but worth considering.

@shimonacarvalho How much do we expect the prime to call fetch-mto-updates given they will be sent the information via the webhook notification. Or will they need to call it after receiving the notification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could provide an endpoint to GetDutyStation and avoid the 3+ level nesting in general. DutyStation addresses don't change too much so it's not a huge deal not to have all the data in the fetchMTOUpdate. We could just return name and id. Not necessary change for this pr but wondered what you thought.

So this particular query is fetching the data used in the TOO queue, not fetch-mto-updates. I actually didn't have to do any Load workarounds in fetch-mto-updates, but I do have a couple of thoughts:

  • I do think it's reasonable to re-evaluate the data returned by fetch-mto-updates and perhaps prune it some. Another endpoint (I think there's a support one already, but we'd need something the prime could access) could get the full tree of data for a specific MTO, but fetch-mto-updates would just return a higher-level overview. But it depends how we expect the prime to use it.
  • I haven't done the forensics on this particular fetcher, but I think we sometimes create query functions/services that over time have more eager loading added to them as other use cases come up. The problem is that not every use case needs all this data so some cycles get wasted fetching data that's never used in some cases.

For this PR, I mainly focused on optimizing what was already there and intentionally not changing the data returned to avoid accidentally breaking something else in the process.

Copy link
Contributor

@shimonacarvalho shimonacarvalho Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this particular query is fetching the data used in the TOO queue, not fetch-mto-updates

Oh my bad.

could get the full tree of data for a specific MTO, but fetch-mto-updates would just return a higher-level overview.

Yes, I just fixed up getMTO and it's really good for returning everything in a single mto. Easy to move from support --> prime.

I think the challenge with fetchMTOUpdates is that its real value to the overall API is returning mtos where ANY data on the mto has changed since a certain date. However that is not working right now, it doesn't "deep-check" nested updatedAts. So it really needs a full rethink/rewrite which we haven't been able to get prioritization on.

@shimonacarvalho How much do we expect the prime to call fetch-mto-updates given they will be sent the information via the webhook notification. Or will they need to call it after receiving the notification?

@gidjin I just don't know honestly, since we have no Prime to ask. But i believe we need to make it work properly, with the since feature. Because even if push notifications works well and they get all the data they need normally, at some point when push fails (a cert expires or whatever) they will have to resync using fetch.

Copy link
Contributor

@gidjin gidjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Ran locally and wow is it fast! 🚢

@@ -56,7 +56,7 @@ type Order struct {
Grade *string `json:"grade" db:"grade"`
Entitlement *Entitlement `belongs_to:"entitlements"`
EntitlementID *uuid.UUID `json:"entitlement_id" db:"entitlement_id"`
OriginDutyStation *DutyStation `belongs_to:"duty_stations"`
OriginDutyStation *DutyStation `belongs_to:"duty_stations" fk_id:"origin_duty_station_id"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting it! 🥳

Comment on lines -82 to -89
if err != nil {
switch err {
case sql.ErrNoRows:
return []models.Move{}, 0, services.NotFoundError{}
default:
return []models.Move{}, 0, err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the if statement check on line 9 doesn't look for ErrNoRows to return a NotFoundError and was wondering if it should.

// OriginDutyStation is a pointer.
// There appears to be a bug in Pop for EagerPreload when you have two or more eager paths with 3+ levels
// where the first 2 levels match. For example:
// "Orders.OriginDutyStation.Address" and "Orders.OriginDutyStation.TransportationOffice"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting bug, I think we have a few other places that such things happen so it's good to have documented. That said having a get duty station end point might make sense given they won't change much and it seems likely the prime will have a list of them as well. Totally agree that this isn't something we need for this PR but worth considering.

@shimonacarvalho How much do we expect the prime to call fetch-mto-updates given they will be sent the information via the webhook notification. Or will they need to call it after receiving the notification?

Copy link
Contributor

@duncan-truss duncan-truss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great gains! Pop is always turning up more obscure bugs, the wiki docs were a good explanation.

TTV knows about the extra queue query, we have a story to fix it dealing with the default sort state.

@reggieriser
Copy link
Contributor Author

I've created a test case and filed an issue with Pop about the problems using EagerPreload with long association paths: gobuffalo/pop#626

I've also added notes about the issue to our Wiki's EagerPreload page.

Copy link
Contributor

@shimonacarvalho shimonacarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank YOU!

@reggieriser reggieriser merged commit 557f822 into master Feb 11, 2021
@reggieriser reggieriser deleted the rr-MB-6628-eagerpreload branch February 11, 2021 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants