Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accountsdb fuzzer is broken on main #536

Open
Sobeston opened this issue Feb 4, 2025 · 3 comments · Fixed by #573
Open

Accountsdb fuzzer is broken on main #536

Sobeston opened this issue Feb 4, 2025 · 3 comments · Fixed by #573
Assignees
Labels
bug Something isn't working

Comments

@Sobeston
Copy link
Contributor

Sobeston commented Feb 4, 2025

Description

We get an OutOfMemory after a while:

level=debug scope=accounts_db message="cleaned 9883 slots - old_state: 0, zero_lamports: 0" time=2025-02-04T21:46:44.580Z
deleting snapshot dir...
deleted snapshot dir
error: OutOfMemory
/usr/lib/zig/std/mem/Allocator.zig:225:89: 0x30ef4f6 in allocBytesWithAlignment__anon_26438 (fuzz)
    const byte_ptr = self.rawAlloc(byte_count, log2a(alignment), return_address) orelse return Error.OutOfMemory;
                                                                                        ^
/usr/lib/zig/std/mem/Allocator.zig:211:5: 0x3108d5a in allocWithSizeAndAlignment__anon_27069 (fuzz)
    return self.allocBytesWithAlignment(alignment, byte_count, return_address);
    ^
/usr/lib/zig/std/mem/Allocator.zig:129:5: 0x303faea in alloc__anon_15480 (fuzz)
    return self.allocAdvancedWithRetAddr(T, null, n, @returnAddress());
    ^
sig/src/bincode/bincode.zig:200:37: 0x303f748 in readWithConfig__anon_15479 (fuzz)
                    const entries = try allocator.alloc(info.child, try bincode.read(allocator, usize, reader, params));
                                    ^
sig/src/bincode/bincode.zig:66:5: 0x303fe3a in read__anon_15477 (fuzz)
    return readWithConfig(allocator, U, reader, params, getConfig(U) orelse .{});
    ^
sig/src/accountsdb/snapshots.zig:1674:25: 0x3041024 in bincodeRead__anon_15344 (fuzz)
            else => |e| return e,
                        ^
sig/src/bincode/bincode.zig:83:9: 0x30416e4 in readWithConfig__anon_15343 (fuzz)
        return deserialize_fcn(allocator, reader, params);
        ^
sig/src/bincode/bincode.zig:66:5: 0x3041844 in read__anon_15342 (fuzz)
    return readWithConfig(allocator, U, reader, params, getConfig(U) orelse .{});
    ^
sig/src/bincode/bincode.zig:140:48: 0x3046a22 in readWithConfig__anon_14536 (fuzz)
                    @field(data, field.name) = try bincode.read(allocator, field.type, reader, params);
                                               ^
sig/src/bincode/bincode.zig:66:5: 0x3046c45 in read__anon_14534 (fuzz)
    return readWithConfig(allocator, U, reader, params, getConfig(U) orelse .{});
    ^
sig/src/accountsdb/snapshots.zig:1794:16: 0x3046cf9 in decodeFromBincode__anon_14533 (fuzz)
        return try bincode.read(allocator, Manifest, reader, .{});
               ^
sig/src/accountsdb/snapshots.zig:1786:16: 0x3047009 in readFromFile (fuzz)
        return try decodeFromBincode(allocator, fbs.reader());
               ^
sig/src/accountsdb/snapshots.zig:2608:24: 0x3047460 in fromFiles (fuzz)
            break :blk try Manifest.readFromFile(allocator, full_file);
                       ^
sig/src/accountsdb/fuzz.zig:335:39: 0x306eb0d in run (fuzz)
            const combined_manifest = try sig.accounts_db.FullAndIncrementalManifest.fromFiles(
                                      ^
sig/src/fuzz.zig:88:24: 0x30ea13d in main (fuzz)
        .accountsdb => try accountsdb_fuzz.run(seed, &cli_args),
                       ^

How to Reproduce the Bug

$ zig build fuzz -- accountsdb

Additional Context

No response

@Sobeston Sobeston added the bug Something isn't working label Feb 4, 2025
@dnut dnut moved this to 📋 Backlog in Sig Feb 4, 2025
@0xNineteen
Copy link
Contributor

@InKryption can you check this out when you get a chance? seems like it has to do with snapshot deserialization

@InKryption
Copy link
Contributor

I believe I've figured out the culprit of the specific issue listed here, but it seems doing that then uncovers other issues.

Immediate cause:
For the last two fields of the AccountsDbFields struct, serialization logic is:

if (data.rooted_slot_hashes.len != 0 or data.rooted_slots.len != 0) {
try bincode.write(writer, data.rooted_slots, params);
}
if (data.rooted_slot_hashes.len != 0) {
try bincode.write(writer, data.rooted_slot_hashes, params);
}

This doesn't consider that the ExtraFields struct that comes right after can be non-EOF, meaning the len for rooted_slots instead reads the first field value of said struct:
pub const ExtraFields = struct {
lamports_per_signature: u64,

With this often taking on values upwards of 156_461_825_832_971_108 - which would mean about 156+ quadrillion rooted slot values, which may be a bit more memory than one would usually have on hand.

Simple enough to fix this, we just need to unconditionally write those slices, simple as.
And that does stop this OOM from occuring. However, it uncovers this new issue:

Image

Now we're somehow running into a situation where multiple accounts at different file IDs have the same slot number.

@Sobeston
Copy link
Contributor Author

Think we've made progress, however this one still happens sometimes:

level=info scope=accounts_db message="gathering account hashes [thread0]: 198/256 (77%) (est: 8.969s elp: 30.62s)" time=2025-02-26T04:38:35.162Z
level=info scope=accounts_db message="gathering account hashes [thread0]: 229/256 (89%) (est: 4.213s elp: 35.74s)" time=2025-02-26T04:38:40.282Z
level=debug scope=accounts_db message="collecting hashes from accounts took: 38.655s" time=2025-02-26T04:38:43.197Z9d642da3dfb899297f14bb5bd7b6cd7
level=info scope=accounts_db message="computing the merkle root over accounts..." time=2025-02-26T04:38:43.197Zse58.zig -Mzig-network=/home/so
level=debug scope=accounts_db message="computing the merkle root over accounts took 169.116ms" time=2025-02-26T04:38:43.366Z/.zig-cache/o/195de1197a142b015
level=info scope=accounts_db message="Generating full snapshot 'snapshot-244620-CaEeRRgZFVQzXnZTpCQraV8NbumkG68oJ8tmjjNt6u7.tar.zst' (full path: sig/data/fuzz-data/accountsdb/snapshot-244620-CaEeRRgZFVQzXnZTpCQraV8NbumkG68oJ8tmjjNt6u7.tar.zst)." time=2025-02-26T04:38:43.478Z
thread 975490 panic: reached unreachable code
/usr/lib/zig/std/debug.zig:412:14: 0x32dec7c in assert (fuzz)
    if (!ok) unreachable; // assertion failure
             ^
/usr/lib/zig/std/array_hash_map.zig:954:19: 0x33f5c91 in putAssumeCapacityNoClobberContext (fuzz)0 -Mrocksdb=/home/sobe/.cache/z
            assert(!result.found_existing);
                  ^
/usr/lib/zig/std/array_hash_map.zig:950:58: 0x3314399 in putAssumeCapacityNoClobber (fuzz)I 
            return self.putAssumeCapacityNoClobberContext(key, value, undefined);
                                                         ^
sig/src/accountsdb/db.zig:2664:61: 0x3557a63 in generateFullSnapshotWithCompressor (fuzz)
            serializable_file_map.putAssumeCapacityNoClobber(account_file.slot, .{
                                                            ^
sig/src/accountsdb/db.zig:1374:64: 0x35666d7 in runManagerLoop (fuzz)ies)
                _ = try self.generateFullSnapshotWithCompressor(zstd_compressor, zstd_buffer, .{
                                                               ^
/usr/lib/zig/std/Thread.zig:429:13: 0x34a1ffe in callFn__anon_32551 (fuzz)
            @call(.auto, f, args) catch |err| {
            ^
/usr/lib/zig/std/Thread.zig:674:30: 0x33d6712 in entryFn (fuzz)
                return callFn(f, args_ptr.*);
                             ^
???:?:?: 0x7faa6ef18709 in ??? (libc.so.6)
Unwind information for `libc.so.6:0x7faa6ef18709` was not available, trace may be incomplete

???:?:?: 0x7faa6ef9caab in ??? (libc.so.6)
fuzz
└─ run fuzz failure

@Sobeston Sobeston reopened this Feb 26, 2025
@github-project-automation github-project-automation bot moved this from ✅ Done to 🔖 Ready in Sig Feb 26, 2025
@InKryption InKryption self-assigned this Feb 27, 2025
@dnut dnut moved this from 🔖 Ready to 🏗 In progress in Sig Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.

3 participants