Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DEBUG DIGEST module data type callback #21

Open
wants to merge 4 commits into
base: unstable
Choose a base branch
from

Conversation

nnmehta
Copy link

@nnmehta nnmehta commented Nov 15, 2024

Adds callback for digest and debug digest integration tests for different scenarios like COPY, RDB load and AOF Rewrite

Closes #9

@YueTang-Vanessa
Copy link
Contributor

Please check the DCO guide and signoff your PR: https://github.com/valkey-io/valkey-bloom/pull/21/checks?check_run_id=33062120817.

src/digest.rs Outdated

/// `Digest` is a high-level rust interface to the Valkey module C API
/// abstracting away the raw C ffi calls.
pub struct Digest {
Copy link
Member

@KarthikSubbarao KarthikSubbarao Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a solution until the DEBUG wrapper functionality is added to the valkeymodule-rs SDK.

We can remove this once the DEBUG wrapper functionality is released in a new version

let mut dig = Digest::new(md);
let val = &*(value.cast::<BloomFilterType>());
dig.add_long_long(val.expansion.into());
dig.add_long_long(val.capacity());
Copy link
Member

@KarthikSubbarao KarthikSubbarao Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capacity() returns a per BloomFilterType result which is summed across all filters in the object.

We will need to add data which is specific to the overall BloomFilterType (including every sub filter).

This means, we need to add the struct member data from the top level - BloomFilterType structure, and then we will need to add the struct member values from the inner BloomFilter structures in the vector

This is needed for data correctness.

For example: If we just add capacity, we can have an bloom with overall 100 capacity from one single filter. But we can also have another object where this is split across 5 filters adding up to a total of 100. These objects are not the same, hence updating the debug logic as mentioned above will handle this

@nnmehta nnmehta reopened this Nov 21, 2024
Signed-off-by: Nihal Mehta <[email protected]>
src/digest.rs Outdated
pub dig: *mut raw::RedisModuleDigest,
}

impl Digest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this file into the wrapper directory

Signed-off-by: Nihal Mehta <[email protected]>
@@ -20,7 +20,10 @@ def test_basic_aofrewrite_and_restore(self):
bf_info_result_1 = client.execute_command('BF.INFO testSave')
assert(len(bf_info_result_1)) != 0
curr_item_count_1 = client.info_obj().num_keys()

# cmd debug digest
client.debug_digest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that this is not None? Also, when we restart the server later on, can we compare and check that they are the same?

assert bf_info_result_2 == bf_info_result_1
assert debug_restore == debug_original
client.execute_command('DEL testSave')

def test_aofrewrite_bloomfilter_metrics(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a debug digest test here

@@ -39,10 +39,15 @@ def test_copy_and_exists_cmd(self):
assert client.execute_command('EXISTS filter') == 1
mexists_result = client.execute_command('BF.MEXISTS filter item1 item2 item3 item4')
assert len(madd_result) == 4 and len(mexists_result) == 4
# cmd debug digest
client.debug_digest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that this is not None? Also, when we restart the server later on, can we compare and check that they are the same?

@@ -14,7 +14,9 @@ def test_basic_save_and_restore(self):
bf_info_result_1 = client.execute_command('BF.INFO testSave')
assert(len(bf_info_result_1)) != 0
curr_item_count_1 = client.info_obj().num_keys()

client.debug_digest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that this is not None? Also, when we restart the server later on, can we compare and check that they are the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And not 0000000000000000000000000000000000000000

@@ -474,7 +474,7 @@ def port_tracker_fixture(self, resource_port_tracker):
self.port_tracker = resource_port_tracker

def _get_valkey_args(self):
self.args.update({"maxmemory":self.maxmemory, "maxmemory-policy":"allkeys-random", "activerehashing":"yes", "databases": self.num_dbs, "repl-diskless-sync": "yes", "save": ""})
self.args.update({"maxmemory":self.maxmemory, "maxmemory-policy":"allkeys-random", "activerehashing":"yes", "databases": self.num_dbs, "repl-diskless-sync": "yes", "save": "", "enable-debug-command":"yes"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us also add testing in two other places:

  1. test_correctness.py - both scaling and non scaling filters should have ensured of correctness
  2. test_replication.py - replicated commands should have the same digest

dig.add_long_long(val.expansion.into());
dig.add_string_buffer(&val.fp_rate.to_le_bytes());
for filter in &val.filters {
dig.add_string_buffer(&filter.bloom.bitmap());
Copy link
Member

@KarthikSubbarao KarthikSubbarao Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add the sip_keys of every filter into the digest. When we compare two bloom objects, the sip keys of hash functions of the bloom filters should be compared as well.

If they are different, they are not the same object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step1 - Implement sip_keys on the BloomFilter struct

    /// Return the keys used by the sip hasher of the raw bloom.
    pub fn sip_keys(&self) -> [(u64, u64); 2] {
        self.bloom.sip_keys()
    }

Step 2 - Here, from the callback, write the 4 numbers (which are u64) into the digest using add_long_long()

Signed-off-by: Nihal Mehta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for the DIGEST module data type callback
3 participants