Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hard/soft bloom filter base + stash #22828

Open
wants to merge 1 commit into
base: soft-block-cron-job
Choose a base branch
from

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Nov 6, 2024

Fixes: mozilla/addons#15014
Relates to: mozilla/addons#15155

Warning

This PR contains #22885 which should land first

Description

Adds logic to generate and write bloom filters for both soft and hard blocked addons. Additionally this PR introduces logic to determine whether we should update one or both bloom filters and or a stash as multiple possible outcomes are possible now. Finally, we handle cleaning up files on a more granular level from both the local storage and remote settings.

Context

Now when we run the upload_mlbf_to_remote_settings cron job we will check for both hard and soft blocked items. It is possible to:

  • do nothing
  • upload a hard block filter only
  • upload a hard block filter and a stash (for soft blocks)
  • upload a soft block filter
  • upload a soft block filter and a stash (for hard blocks)
  • upload both filters

This adds a bit of complexity we need to address.

Additionally, instead of deleting all records from remote settings, we need to check for the current set of block filters and only delete records older than the older of the two.

Finally, since it is also possible to run the cron when no updates have occurred, we can safely delete mlbf cache files when that happens as there is no benefit from diffing an empty array.

Testing

This is gonna suck to test. First some preparation work.

Setup

  • Setup a local remote server here
  • Set the base replace threshold to a low number (so you can trigger re-uploading of filters without creating a bunch of blocks)
  • enable enable-soft-blocking and blocklist_mlbf_submit waffle switch

src/olympia/constants/blocklist.py

BASE_REPLACE_THRESHOLD = 1

See the test scenarios

from olympia.blocklist.models import BlockType
from olympia.amo.tests import addon_factory, block_factory, version_factory

def _blocked_addon(block_type=BlockType.BLOCKED, **kwargs):
    addon = addon_factory(**kwargs)
    block = block_factory(
        guid=addon.guid, updated_by=user, block_type=block_type
    )
    return addon, block

user = UserProfile.objects.first()

Now you can call the _blocked_addon method to create an addon with block/version of the specified type.

Ex:

_blocked_addon(block_type=BlockType.BLOCKED)
_blocked_addon(block_type=BlockType.BLOCKED)
_blocked_addon(block_type=BlockType.SOFT_BLOCKED)

If you run the cron job now, you'd expect a blocked filter and a stash with the soft blocked version added.

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch 4 times, most recently from 1c5160b to aeb1b36 Compare November 7, 2024 11:15
@KevinMind KevinMind mentioned this pull request Nov 7, 2024
5 tasks
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch 7 times, most recently from f30bf73 to f387a3d Compare November 12, 2024 10:22
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Show resolved Hide resolved
src/olympia/blocklist/tasks.py Outdated Show resolved Hide resolved
@willdurand
Copy link
Member

Fixes: mozilla/addons#15014
Fixes: mozilla/addons#15166
Rleates to: mozilla/addons#15155

we should almost never fix two issues with a single PR, so please fix mozilla/addons#15166 in a different PR.

@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch 6 times, most recently from 227f07e to 3917ee4 Compare November 13, 2024 20:10
@KevinMind
Copy link
Contributor Author

Fixes: mozilla/addons#15014
Fixes: mozilla/addons#15166
Rleates to: mozilla/addons#15155

we should almost never fix two issues with a single PR, so please fix mozilla/addons#15166 in a different PR.

I've never heard of this rule and regularly do this. Why should that be a rule?

@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch 2 times, most recently from dcdf7c1 to 9ff8ccc Compare November 20, 2024 10:46
@KevinMind
Copy link
Contributor Author

@willdurand dropped and closed as this PR introduces the reference to the new switch and so there is nothing to "update"

@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from 9ff8ccc to 7cb4eb5 Compare November 20, 2024 11:13
@KevinMind KevinMind marked this pull request as draft November 20, 2024 11:13
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from 7cb4eb5 to 6c90566 Compare November 20, 2024 14:59
@KevinMind
Copy link
Contributor Author

Run the cron:

./manage.py cron upload_mlbf_to_remote_settings

@KevinMind
Copy link
Contributor Author

KevinMind commented Nov 20, 2024

Scenarios.

Nothing to do

  1. Given: No hard or soft blocks since last reset
  2. When: run the cron
  3. Then:
  • Expect successful run and no files created
  • Expect remote settings to be empty
image

Two hard blocked

  1. Given: Two hard blocks since last reset
  2. When: run the cron
  3. Then:
  • Expect cache.json with two hard blocks
  • Expect remote settings to contain a new attachment with a blocked-bloomfilter-base attatchment_type
image

One hard blocked

  1. Given:
  • One hard block since last reset
  • There is already an uploaded bloom filter
  1. When: run the cron
  2. Then:
  • Expect cache.json with one hard block and stash.json with one hard block
  • Expect remote settings to contain a new stash record with one blocked entry and an empty unblocked entry.
image

if softblocking is enabled, then also expect an empty softblocked entry

image

Two soft blocked

  1. Given:
  • Two softblocked entries since last reset
  • [softblockint][link_soft_blocking] is enabled
  1. When: run the cron
  2. Then:
  • Expect cache.json with 2 soft blocks and a filter-softblocked file
  • Expect [remote settingslink_remote_settings to contain a new attachment with softblocked-bloomfilter-base attatchment_type
image

One soft blocked

  1. Given:
  • One soft blocked since last reset
  • [softblockint][link_soft_blocking] is enabled
  1. When: run the cron
  2. Then:
  • Expect cache.json and a stash.json file with one soft blocked entry
  • Expect [remote settingslink_remote_settings to contain a new record with one softblocked entry and empty unblocked/blocked entries
image

One of each

  1. Given:
  • One hard blocked since last reset
  • One soft blocked since last reset
  • [softblockint][link_soft_blocking] is enabled
  1. When: run the cron
  2. Then:
  • Expect cache.json and a stash.json file with the 2 blocked/softblocked entries respectively
  • Expect [remote settingslink_remote_settings to contain a new record with one softblocked entry, one blocked entry and empty unblocked entry
image

Two of one, one of the other

  1. Given:
  • One hard blocked since last reset
  • Two soft blocked since last reset
  • [softblockint][link_soft_blocking] is enabled
  1. When: run the cron
  2. Then:
  • Expect cache.json and a stash.json file with the 1 hard blocked and a softblocked-filter file
  • Expect [remote settingslink_remote_settings to contain a new record with one blcoked entry, and empty unblocked/blocked entry
    -Expect [remote settingslink_remote_settings to contain a new attacthment with softblocked-bloomfilter-base attatchment_type
  • IMPORTANT Expect that the filter is older than the stash

TODO: the last check doesn't pass currently

image

Two of each

  1. Given:
  • Two hard blocked since last reset
  • Two soft blocked since last reset
  • [softblockint][link_soft_blocking] is enabled
  1. When: run the cron
  2. Then:
  • Expect cache.json and, a softblocked-filter file and a blocked-filter file
  • Expect [remote settingslink_remote_settings to contain a new attacthment with softblocked-bloomfilter-base attatchment_type
  • Expect [remote settingslink_remote_settings to contain a new attacthment with blocked-bloomfilter-base attatchment_type
  • Expect both attachments to have the same generation_time
image

Soft to hard block

  1. Given: 1 soft block since last reset
  2. When:
  • update to a hard block
block.blockversion_set.all().update(block_type=BlockType.BLOCKED)
  1. Then:
  • Expect cache.json with one hard block and stash.json with one hard block
  • Expect remote settings to contain a new stash record with one blocked entry and an empty unblocked entry.
image

Hard to soft block

  1. Given: 1 hard block FROM BEFORE the last reset

It is important that the block you wish to update has already been uploaded in the previous stash. That is the only way for us to verify it will be "un(hard)blocked" and "softblocked" in one step. If it is not a committed hard block, it will appear in the stash like a normal softblock because there is no hardblock to unblock. get it? 🙃

  1. When:
  • update to a soft block
block.blockversion_set.all().update(block_type=BlockType.SOFT_BLOCKED)
  1. Then:
  • Expect cache.json with one hard block and stash.json with one soft block and one unblocked
  • Expect remote settings to contain a new stash record with one softblocked entry and one unblocked entry and an empty blocked entry
image

Empty stash is not uploaded

  1. Given: 1 unsigned block since the last reset
_blocked_addon(block_type=BlockType.BLOCKED, file_kw={'is_signed': False})
  1. When: run the cron
  2. Then:
  • Expect no new files to be created
  • Expect no new records or attachments in remote settings

@KevinMind
Copy link
Contributor Author

How to reset base filters so you can test the next scenario

./manage.py cron upload_mlbf_to_remote_settings force_base=True

@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from 6c90566 to b56d5e3 Compare November 20, 2024 19:49
@KevinMind KevinMind marked this pull request as ready for review November 20, 2024 19:52
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from b56d5e3 to 3762824 Compare November 20, 2024 19:59
@KevinMind KevinMind changed the title Soft-block-bloom-filter-filter Add support for hard/soft bloom filter base + stash Nov 20, 2024
Copy link
Member

@willdurand willdurand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that needs some changes but this looks promising.

src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/constants/blocklist.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tasks.py Outdated Show resolved Hide resolved
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch 2 times, most recently from 766f489 to 6eab666 Compare November 21, 2024 19:20
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from 024fe82 to 4fcf87a Compare November 21, 2024 19:33
Copy link
Member

@willdurand willdurand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback

src/olympia/blocklist/mlbf.py Show resolved Hide resolved
else oldest_base_filter_id
)
if record_time < oldest_base_filter_id:
server.delete_record(record['id'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like the previous filter (that gets replaced by a new one - for the same block type) is removed.

I also noticed that the soft filter was removed when I added enough hard blocks to rebuild a hard filter. This doesn't look correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-thinking about this - and I am not saying this isn't already what's happening - we should:

  1. delete any filter that we are going to re-upload to keep always 1 filter (per block type)
  2. delete everything when both filters are re-generated
  3. delete stash records older than the oldest of the two filters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added logic for 1. We check any pre-existing attachment that matches the filter_list and delete them.
  • I did not include logic explicitly for 2 because it should happen anyway
  • 3 was already implemented

src/olympia/blocklist/cron.py Outdated Show resolved Hide resolved
src/olympia/constants/blocklist.py Outdated Show resolved Hide resolved
@@ -1,8 +1,15 @@
# How many guids should there be in the stashes before we make a new base.
from olympia.blocklist.models import BlockType


BASE_REPLACE_THRESHOLD = 5_000

# Config keys used to track recent mlbf ids
MLBF_TIME_CONFIG_KEY = 'blocklist_mlbf_generation_time'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If filters can be generated at different times, and we have different base ids for hard/soft filters (via MLBF_BASE_ID_CONFIG_KEY), why do we have a single generation time in config?

Comment on lines 316 to 317
# unblocked should include any versions
# removed from blocked or soft blocked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unblocked enough hard blocks to force the creation of a new filter. This resulted in a stash too (because there was a soft-block too) and this stash listed all the unblocked versions. I don't think we want that, even if technically, it might be fine (that's a lot of assumptions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-thinking about this - and, again, not saying this isn't what's happening - we should create a stash record for blocks that are of a type whose filter isn't being regenerated. That sentence is horrible, ugh.

In practice, that would mean:

  1. 1 hard block "change" and enough soft block changes to regenerate the soft block filter = we create a soft block filter and a stash that only contains this 1 hard block change (could we addition or deletion)
  2. 1 soft block "change" and enough hard block changes to regenerate the hard block filter = we create a hard block filter and a stash that only contains this 1 soft block change (could we addition or deletion)
  3. changes of soft and/or hard blocks that only lead to the creation of a stash = we only create a stash

Remove unecessary and redundant code

Fix ordering of cache/stash + increase validity of tests

Upload multiple filters

More logs + correct handling of attachment_type

Verify cron passes correct args to task

TMP: Ignore soft blocks

Add waffle switch

Fix invalid class reference

Update to correct waffle switch

Update to fix the test

reafactoring

add missing tests

Apply suggestions from code review

Co-authored-by: William Durand <[email protected]>

Updates from review

Ensure blocks of type X are excluded from stash if filter of type X is also being uploaded

TMP: squash

Better exclusion of stashes from updated filters + more comment resolution

Delete correct files from remote settings:
- delete any existing attachments matching filters we are reuploading
- delete any stashes that are older than the oldest filter
@KevinMind KevinMind force-pushed the soft-block-bloom-filter-filter branch from bcf4af3 to 1a4fad7 Compare November 22, 2024 17:47
@KevinMind KevinMind changed the base branch from master to soft-block-cron-job November 22, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Generate separate bloomfilter for soft-blocks
2 participants