Skip to content
This repository has been archived by the owner on Jul 21, 2021. It is now read-only.

Revisit blocklist stamp #4

Open
ignoramous opened this issue Jul 20, 2021 · 0 comments
Open

Revisit blocklist stamp #4

ignoramous opened this issue Jul 20, 2021 · 0 comments
Assignees

Comments

@ignoramous
Copy link
Contributor

Today, the blocklists metadata is stored in a dictionary-encoded bitmap as leaf node of domains compacted into a radix trie.

This encoded bitmap is then represented in a URL (like the one rethinkdns/configure generates in response to user's selection) as a url-safe base64.

The bitmap encoding is such that, up to a maximum of 256 -- 16 (blocklist groups) x 16 (blocklists per group) -- unique dictionary values (blocklist index) can be represented. Of course, this can be increased by increasing size of bytes used to represent the blocklist group (currently 2 bytes) and/or blocklists themselves (also at 2 bytes).

Ideally, the encoded representation must optimize for the data being encoded. Analysis tells us that there are likely over 80% domains that are present only either in one or two blocklists. So, there's a potential to rethink the current strategy of encoding the blocklist metadata the way we are while also increasing number of blocklists that RethinkDNS, the resolver, can support.

Total domain entries: 5814457
Unique domains: 2550019
// count is the total number of domains present in #{entry} number of blocklists
// that is, the first line below means, 1412059 domains (55%) exist in just 1 blocklist.
Entry: 1 : count: 1412059 : %: 55.37445015115574
Entry: 2 : count: 554896 : %: 21.76046531418001
Entry: 3 : count: 214961 : %: 8.429780327126975
Entry: 4 : count: 85965 : %: 3.371151352205611
Entry: 5 : count: 44341 : %: 1.7388497889623569
Entry: 6 : count: 62576 : %: 2.4539425000362742
Entry: 7 : count: 26620 : %: 1.0439137904462672
Entry: 8 : count: 49785 : %: 1.9523383943413755
Entry: 9 : count: 27338 : %: 1.0720704433966963
Entry: 10 : count: 17297 : %: 0.6783086714255855
Entry: 11 : count: 13686 : %: 0.5367018833977315
Entry: 12 : count: 17039 : %: 0.6681910997525901
Entry: 13 : count: 9138 : %: 0.358350271115627
Entry: 14 : count: 5380 : %: 0.21097882015781058
Entry: 15 : count: 3046 : %: 0.11945009037187565
Entry: 16 : count: 1807 : %: 0.07086221710504902
Entry: 17 : count: 1139 : %: 0.0446663338586889
Entry: 18 : count: 763 : %: 0.029921345684090984
Entry: 19 : count: 688 : %: 0.02698019112798767
Entry: 20 : count: 413 : %: 0.016195957755608878
Entry: 21 : count: 328 : %: 0.0128626492586918
Entry: 22 : count: 244 : %: 0.009568556155856094
Entry: 23 : count: 179 : %: 0.0070195555405665605
Entry: 24 : count: 109 : %: 0.00427447795487014
Entry: 25 : count: 70 : %: 0.00274507758569642
Entry: 26 : count: 54 : %: 0.0021176312803943814
Entry: 27 : count: 42 : %: 0.0016470465514178522
Entry: 28 : count: 24 : %: 0.0009411694579530583
Entry: 29 : count: 14 : %: 0.0005490155171392841
Entry: 30 : count: 10 : %: 0.00039215394081377434
Entry: 31 : count: 4 : %: 0.00015686157632550972
Entry: 32 : count: 2 : %: 0.00007843078816275486
Entry: 33 : count: 1 : %: 0.00003921539408137743
Entry: 34 : count: 0 : %: 0
Entry: 35 : count: 1 : %: 0.00003921539408137743
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants