-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #37137 - Use insert_all when creating available module streams #10878
Fixes #37137 - Use insert_all when creating available module streams #10878
Conversation
e69483b
to
4b73412
Compare
fixed rubocop |
Your PR is having the same test failures as mine is |
Yeah, apparently #10861 has the fixes. We may have to ignore those for a bit. |
[test katello] |
1 similar comment
[test katello] |
Ready? |
Yes, this should be ready for review again 👍 |
stream: module_stream["stream"]) | ||
else | ||
stream = AvailableModuleStream.where(name: module_stream["name"], | ||
context: module_stream["context"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the indention broken?
@@ -348,11 +348,21 @@ def import_enabled_repositories(repos) | |||
end | |||
|
|||
def import_module_streams(module_streams) | |||
# create_or_find_by avoids race conditions during concurrent registrations but clogs postgres logs with harmless errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of two different implementation, would it be possible to catch the "postgres logs with harmless errors" and handle this accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's the way the race condition is avoided - by letting the Postgres constraint handle it rather than ruby/rails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why won't we hit PG::UniqueViolation
errors during concurrent package profile uploads? Is there a difference in concurrency between 10 systems registering at the same time and 10 systems dnf updating
at the same time?
# create_or_find_by avoids race conditions during concurrent registrations but clogs postgres logs with harmless errors. | ||
# So we'll use create_or_find_by! during registration and first_or_create! otherwise. | ||
registered_time = subscription_facet&.registered_at | ||
use_create_or_find_by = registered_time.nil? || registered_time > 1.minute.ago | ||
streams = {} | ||
module_streams.each do |module_stream| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this code is old, but I wonder if now would be a good time to switch to insert_all
or upsert_all
. It feels odd to do so many creations one postgres insert at a time in a Ruby loop.
Bulk insertion lets you create all entities in a single Postgres insert call. It takes into account duplicates too from what I remember. You can pass in what the records should be unique_by
(in this case name, stream, and context).
There might be a catch, but I think it's worth a thought.
https://apidock.com/rails/v6.0.0/ActiveRecord/Persistence/ClassMethods/insert_all
https://apidock.com/rails/v6.0.0/ActiveRecord/Persistence/ClassMethods/upsert_all
We use them in a couple places:
https://github.com/search?q=repo%3AKatello%2Fkatello%20insert_all&type=code
https://github.com/search?q=repo%3AKatello%2Fkatello+upsert_all&type=code
We haven't had any complaints of that, for whatever reason. It could be because most package profile uploads are from rhsmcertd checkins which are staggered automatically with random delay intervals to prevent too many concurrent requests, whereas registrations can happen at any time. |
4b73412
to
4dfde98
Compare
4dfde98
to
8123b07
Compare
8123b07
to
65430af
Compare
fixed rubocop |
Gonna test first then code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working great! So far I'm not seeing any issues with the code.
I'll give it a final review once the dev logging is cleaned up.
5aad7dc
to
f0bad71
Compare
e3caee8
to
1f8225e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working fine for me! Ruby tests are passing now too.
…atello#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70)
…atello#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70)
* Release 4.12.0-rc1 (#10909) * Fixes #37137 - Use insert_all when creating available module streams (#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70) * Fixes #37178 - use safe navigation for operatingsystem (#10897) (cherry picked from commit 664488d) * Refs #37202 - display name according to setting in host collection (cherry picked from commit de39a36) * Fixes #37192 - Use with_enabled_email scope (#10901) * Fixes #37192 - Use with_enabled_email scope Instead of typing out the same query explicitly. * Refs #37192 - Bump required foreman version (cherry picked from commit 1ace8e5) * Refs #37203 - display name according to setting in errata host list (cherry picked from commit 2775554) * Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host (cherry picked from commit 9d0a338) * Fixes #37108 - Preload content_view_components (#10864) (cherry picked from commit d98c17b) * Fixes #36976 - Too many audit records slow down CV loading (#10911) (cherry picked from commit 30c6977) * Refs #37201 - display short host name in activation keys menu (cherry picked from commit 8164043) * Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test (cherry picked from commit 99c7857) * Fixes #37198 - Allow installedDeb package attributes in safemode (cherry picked from commit b5d785f) * Fixes #37197 - Kickstart repository correctly listed on hostgroup (cherry picked from commit 35c49b1) * Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905) (cherry picked from commit 6420cf8) --------- Co-authored-by: Jeremy Lenz <[email protected]> Co-authored-by: Matěj Mudra <[email protected]> Co-authored-by: Adam Růžička <[email protected]> Co-authored-by: Manisha Singhal <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Samir Jha <[email protected]> Co-authored-by: ianballou <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Partha Aji <[email protected]> Co-authored-by: Thorben <[email protected]>
…atello#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70)
…atello#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70)
* Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test (cherry picked from commit 99c7857) * Fixes #37137 - Use insert_all when creating available module streams (#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70) * Fixes #37178 - use safe navigation for operatingsystem (#10897) (cherry picked from commit 664488d) * Refs #37202 - display name according to setting in host collection (cherry picked from commit de39a36) * Refs #37201 - display short host name in activation keys menu (cherry picked from commit 8164043) * Refs #37203 - display name according to setting in errata host list (cherry picked from commit 2775554) * Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host (cherry picked from commit 9d0a338) * Fixes #37198 - Allow installedDeb package attributes in safemode (cherry picked from commit b5d785f) * Fixes #37108 - Preload content_view_components (#10864) (cherry picked from commit d98c17b) * Fixes #37197 - Kickstart repository correctly listed on hostgroup (cherry picked from commit 35c49b1) * Fixes #36976 - Too many audit records slow down CV loading (#10911) (cherry picked from commit 30c6977) * Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905) (cherry picked from commit 6420cf8) --------- Co-authored-by: ianballou <[email protected]> Co-authored-by: Jeremy Lenz <[email protected]> Co-authored-by: Matěj Mudra <[email protected]> Co-authored-by: Manisha Singhal <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Partha Aji <[email protected]> Co-authored-by: Samir Jha <[email protected]> Co-authored-by: Thorben <[email protected]>
…atello#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70)
* Fixes #37187 - Update ACS refresh Pulp fixtures + fix repository_test test (cherry picked from commit 99c7857) * Fixes #37137 - Use insert_all when creating available module streams (#10878) * Fixes #37137 - Use insert_all when creating available module streams (cherry picked from commit 7458f70) * Fixes #37178 - use safe navigation for operatingsystem (#10897) (cherry picked from commit 664488d) * Refs #37202 - display name according to setting in host collection (cherry picked from commit de39a36) * Refs #37201 - display short host name in activation keys menu (cherry picked from commit 8164043) * Refs #37203 - display name according to setting in errata host list (cherry picked from commit 2775554) * Fixes #37214 - Change 'default' for limit to environment checkbox in activation-key and content-host (cherry picked from commit 9d0a338) * Fixes #37198 - Allow installedDeb package attributes in safemode (cherry picked from commit b5d785f) * Fixes #37108 - Preload content_view_components (#10864) (cherry picked from commit d98c17b) * Fixes #37197 - Kickstart repository correctly listed on hostgroup (cherry picked from commit 35c49b1) * Fixes #36976 - Too many audit records slow down CV loading (#10911) (cherry picked from commit 30c6977) * Fixes #37169 - Managing a Hosts Repository Sets does not behave as expected (#10905) (cherry picked from commit 6420cf8) * Refs #37214 - Use Limit to Environment by default (cherry picked from commit 8bedda9) --------- Co-authored-by: ianballou <[email protected]> Co-authored-by: Jeremy Lenz <[email protected]> Co-authored-by: Matěj Mudra <[email protected]> Co-authored-by: Manisha Singhal <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Bernhard Suttner <[email protected]> Co-authored-by: Partha Aji <[email protected]> Co-authored-by: Samir Jha <[email protected]> Co-authored-by: Thorben <[email protected]>
What are the changes introduced in this pull request?
During a profile upload, a host's module streams are updated in Katello's database.
In #10412 we changed to using
create_or_find_by!
to create those records. This solved an issue where concurrent host registrations would sometimes fail due to a PostgreSQL error:However, these errors are still inserted into postgres logs. They are harmless, but log files can fill up quickly. This is because there will be one log for every module stream creation attempt, for every host with module streams, not just at registration but also during every profile upload. This results in log files that can quickly build to 200000 lines with just those error logs.
With this change, we're now using
insert_all
to insert all of the records in a single SQL query, which should avoid the race condition.Considerations taken when implementing this change?
This required some major refactoring, because
import_module_streams
takes module stream data in a list of Hashes, but those don't include Katello's ID. (There's no way they could, as this info is coming from sub-man/dnf.)id
s returned fromcreate_or_find_by
etc. to pass on to the next method,sync_available_module_stream_associations
.sync_available_module_stream_associations
requires a Hash indexed by database ID.sync_available_module_stream_associations
was all of the host's available module streams, indexed by ID, including the ones we just newly added. Thus,new_available_module_streams
was a bit of a misnomer. I did some refactoring to recreate this format of the indexed hash.What are the testing steps for this pull request?
tail -f /var/lib/pgsql/data/log/postgresql-Thu.log
(replace Thu with whatever day it is today)You can also check how many logs there are: