-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hard/soft bloom filter base + stash #22828
base: soft-block-cron-job
Are you sure you want to change the base?
Conversation
1c5160b
to
aeb1b36
Compare
f30bf73
to
f387a3d
Compare
f39e370
to
08d3851
Compare
779be6e
to
b2007e3
Compare
we should almost never fix two issues with a single PR, so please fix mozilla/addons#15166 in a different PR. |
227f07e
to
3917ee4
Compare
I've never heard of this rule and regularly do this. Why should that be a rule? |
3917ee4
to
dc6f9cb
Compare
dcdf7c1
to
9ff8ccc
Compare
@willdurand dropped and closed as this PR introduces the reference to the new switch and so there is nothing to "update" |
9ff8ccc
to
7cb4eb5
Compare
7cb4eb5
to
6c90566
Compare
Run the cron: ./manage.py cron upload_mlbf_to_remote_settings |
Scenarios. Nothing to do
Two hard blocked
One hard blocked
Two soft blocked
One soft blocked
One of each
Two of one, one of the other
TODO: the last check doesn't pass currently Two of each
Soft to hard block
block.blockversion_set.all().update(block_type=BlockType.BLOCKED)
Hard to soft block
block.blockversion_set.all().update(block_type=BlockType.SOFT_BLOCKED)
Empty stash is not uploaded
_blocked_addon(block_type=BlockType.BLOCKED, file_kw={'is_signed': False})
|
How to reset base filters so you can test the next scenario ./manage.py cron upload_mlbf_to_remote_settings force_base=True |
6c90566
to
b56d5e3
Compare
b56d5e3
to
3762824
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that needs some changes but this looks promising.
766f489
to
6eab666
Compare
024fe82
to
4fcf87a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback
src/olympia/blocklist/tasks.py
Outdated
else oldest_base_filter_id | ||
) | ||
if record_time < oldest_base_filter_id: | ||
server.delete_record(record['id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like the previous filter (that gets replaced by a new one - for the same block type) is removed.
I also noticed that the soft filter was removed when I added enough hard blocks to rebuild a hard filter. This doesn't look correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-thinking about this - and I am not saying this isn't already what's happening - we should:
- delete any filter that we are going to re-upload to keep always 1 filter (per block type)
- delete everything when both filters are re-generated
- delete stash records older than the oldest of the two filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added logic for 1. We check any pre-existing attachment that matches the
filter_list
and delete them. - I did not include logic explicitly for 2 because it should happen anyway
- 3 was already implemented
src/olympia/constants/blocklist.py
Outdated
@@ -1,8 +1,15 @@ | |||
# How many guids should there be in the stashes before we make a new base. | |||
from olympia.blocklist.models import BlockType | |||
|
|||
|
|||
BASE_REPLACE_THRESHOLD = 5_000 | |||
|
|||
# Config keys used to track recent mlbf ids | |||
MLBF_TIME_CONFIG_KEY = 'blocklist_mlbf_generation_time' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If filters can be generated at different times, and we have different base ids for hard/soft filters (via MLBF_BASE_ID_CONFIG_KEY
), why do we have a single generation time in config?
src/olympia/blocklist/mlbf.py
Outdated
# unblocked should include any versions | ||
# removed from blocked or soft blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unblocked enough hard blocks to force the creation of a new filter. This resulted in a stash too (because there was a soft-block too) and this stash listed all the unblocked versions. I don't think we want that, even if technically, it might be fine (that's a lot of assumptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-thinking about this - and, again, not saying this isn't what's happening - we should create a stash record for blocks that are of a type whose filter isn't being regenerated. That sentence is horrible, ugh.
In practice, that would mean:
- 1 hard block "change" and enough soft block changes to regenerate the soft block filter = we create a soft block filter and a stash that only contains this 1 hard block change (could we addition or deletion)
- 1 soft block "change" and enough hard block changes to regenerate the hard block filter = we create a hard block filter and a stash that only contains this 1 soft block change (could we addition or deletion)
- changes of soft and/or hard blocks that only lead to the creation of a stash = we only create a stash
6fe2532
to
aefa829
Compare
70a2fb0
to
bcf4af3
Compare
Remove unecessary and redundant code Fix ordering of cache/stash + increase validity of tests Upload multiple filters More logs + correct handling of attachment_type Verify cron passes correct args to task TMP: Ignore soft blocks Add waffle switch Fix invalid class reference Update to correct waffle switch Update to fix the test reafactoring add missing tests Apply suggestions from code review Co-authored-by: William Durand <[email protected]> Updates from review Ensure blocks of type X are excluded from stash if filter of type X is also being uploaded TMP: squash Better exclusion of stashes from updated filters + more comment resolution Delete correct files from remote settings: - delete any existing attachments matching filters we are reuploading - delete any stashes that are older than the oldest filter
bcf4af3
to
1a4fad7
Compare
Fixes: mozilla/addons#15014
Relates to: mozilla/addons#15155
Warning
This PR contains #22885 which should land first
Description
Adds logic to generate and write bloom filters for both soft and hard blocked addons. Additionally this PR introduces logic to determine whether we should update one or both bloom filters and or a stash as multiple possible outcomes are possible now. Finally, we handle cleaning up files on a more granular level from both the local storage and remote settings.
Context
Now when we run the
upload_mlbf_to_remote_settings
cron job we will check for both hard and soft blocked items. It is possible to:This adds a bit of complexity we need to address.
Additionally, instead of deleting all records from remote settings, we need to check for the current set of block filters and only delete records older than the older of the two.
Finally, since it is also possible to run the cron when no updates have occurred, we can safely delete mlbf cache files when that happens as there is no benefit from diffing an empty array.
Testing
This is gonna suck to test. First some preparation work.
Setup
enable-soft-blocking
andblocklist_mlbf_submit
waffle switchsrc/olympia/constants/blocklist.py
See the test scenarios
Now you can call the
_blocked_addon
method to create an addon with block/version of the specified type.Ex:
If you run the cron job now, you'd expect a blocked filter and a stash with the soft blocked version added.
Checklist
#ISSUENUM
at the top of your PR to an existing open issue in the mozilla/addons repository.