Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solana-validator crashes during the snapshot unpacking process #35649

Closed
Nowasky opened this issue Sep 5, 2024 · 1 comment
Closed

solana-validator crashes during the snapshot unpacking process #35649

Nowasky opened this issue Sep 5, 2024 · 1 comment
Labels
community Community contribution

Comments

@Nowasky
Copy link

Nowasky commented Sep 5, 2024

The Solana validator crashes on my server whenever it tries to unpack a snapshot. What could be the cause of this error? I'm suspecting either a faulty network connection, causing the downloaded files to get corrupted, or faulty memory modules.

[2024-09-05T18:58:16.359869581Z INFO  solana_entry::poh] Running 800000 hashes...
[2024-09-05T18:58:16.393027370Z INFO  solana_core::validator] PoH speed check: Will sleep 366844311ns per slot.
[2024-09-05T18:58:16.393038070Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 1000000
[2024-09-05T18:58:16.393040310Z INFO  solana_ledger::blockstore] Opening blockstore at "/mnt/solana/ledger/ledger/rocksdb"
[2024-09-05T18:58:16.394367171Z WARN  solana_ledger::blockstore_db] Unable to detect Rocks columns: Error { message: "IO error: No such file or directory: While opening a file for sequentially reading: /mnt/solana/ledger/ledger/rocksdb/CURRENT: No such file or directory" }
[2024-09-05T18:58:16.519222165Z INFO  solana_ledger::blockstore] Opening blockstore done; blockstore open took 126ms
[2024-09-05T18:58:16.519484771Z INFO  solana_ledger::bank_forks_utils] Initializing bank snapshots dir: /mnt/solana/ledger/ledger/snapshot
[2024-09-05T18:58:16.519506131Z INFO  solana_runtime::snapshot_bank_utils] Loading bank from full snapshot archive: /mnt/solana/ledger/ledger/remote/snapshot-287979899-4uCZNUR7ZQwzC6tWJeoJtPEQSwuhui54SbfadBMgVits.tar.zst, and incremental snapshot archive: Some("/mnt/solana/ledger/ledger/remote/incremental-snapshot-287979899-288000931-EEv1fwtT7A16bXFXX2BJMkGZLPPzLjtfud13EHP1e789.tar.zst")
[2024-09-05T18:58:21.360772426Z INFO  solana_metrics::metrics] datapoint: net-stats-validator in_datagrams_delta=0i no_ports_delta=0i in_errors_delta=4i out_datagrams_delta=0i rcvbuf_errors_delta=4i sndbuf_errors_delta=0i in_csum_errors_delta=0i ignored_multi_delta=0i in_errors=11079151i rcvbuf_errors=11079151i sndbuf_errors=0i rx_bytes_delta=16327i rx_packets_delta=217i rx_errs_delta=0i rx_drops_delta=1i rx_fifo_delta=0i rx_frame_delta=0i tx_bytes_delta=20616i tx_packets_delta=42i tx_errs_delta=0i tx_drops_delta=0i tx_fifo_delta=0i tx_colls_delta=0i
[2024-09-05T18:58:21.360826785Z INFO  solana_metrics::metrics] datapoint: memory-stats total=134201765888i swap_total=554050772992i free_percent=24.08272553654223 used_bytes=19885404160i avail_percent=85.18245715440462 buffers_percent=0.1967610711027248 cached_percent=66.62242210629114 swap_free_percent=100
[2024-09-05T18:58:23.361390449Z INFO  solana_metrics::metrics] datapoint: net-stats-validator in_datagrams_delta=0i no_ports_delta=0i in_errors_delta=8i out_datagrams_delta=0i rcvbuf_errors_delta=8i sndbuf_errors_delta=0i in_csum_errors_delta=0i ignored_multi_delta=0i in_errors=11079159i rcvbuf_errors=11079159i sndbuf_errors=0i rx_bytes_delta=10580i rx_packets_delta=165i rx_errs_delta=0i rx_drops_delta=1i rx_fifo_delta=0i rx_frame_delta=0i tx_bytes_delta=7590i tx_packets_delta=11i tx_errs_delta=0i tx_drops_delta=0i tx_fifo_delta=0i tx_colls_delta=0i
thread 'solUnpkSnpsht01' panicked at /home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45:
called `Result::unwrap()` on an `Err` value: "SendError(..)"
thread 'solUnpkSnpsht02' panicked at /home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45:
called `Result::unwrap()` on an `Err` value: "SendError(..)"
thread 'solUnpkSnpsht03' panicked at /home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45:
called `Result::unwrap()` on an `Err` value: "SendError(..)"
thread 'solUnpkSnpsht00' panicked at /home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45:
called `Result::unwrap()` on an `Err` value: "SendError(..)"
stack backtrace:
   0:     0x55f3d56fabdc - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x55f3d56fabdc - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55f3d56fabdc - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55f3d56fabdc - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x55f3d572a290 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:     0x55f3d572a290 - core::fmt::write::h3a39390d8560d9c9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:     0x55f3d56f5fff - std::io::Write::write_fmt::h5fc9997dfe05f882
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:     0x55f3d56fa9c4 - std::sys_common::backtrace::_print::h894006fb5c6f3d45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x55f3d56fa9c4 - std::sys_common::backtrace::print::h23a2d212c6fff936
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x55f3d56fc377 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
  10:     0x55f3d56fc0df - std::panicking::default_hook::h6038f2eba384e475
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
  11:     0x55f3d4e7da71 - solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}::hff8368754425ecb4
  12:     0x55f3d56fc988 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h1f8f335eaa9cfaee
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2021:9
  13:     0x55f3d56fc988 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:783:13
  14:     0x55f3d56fc6de - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  15:     0x55f3d56fb0a6 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  16:     0x55f3d56fc442 - rust_begin_unwind
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
  17:     0x55f3d337c925 - core::panicking::panic_fmt::hbf0e066aabfa482c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
  18:     0x55f3d337ce63 - core::result::unwrap_failed::hddb4fea594200c52
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs:1653:5
  19:     0x55f3d49c61e0 - solana_accounts_db::hardened_unpack::unpack_snapshot_with_processors::h34e468b8bc8ef31b
  20:     0x55f3d4984fbd - std::sys_common::backtrace::__rust_begin_short_backtrace::h6e2a975281ea167b
  21:     0x55f3d498c018 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h76d64658762b153a
  22:     0x55f3d5700ac5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc7eafaff61e32df9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  23:     0x55f3d5700ac5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h6ba4a5de48dd2304
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  24:     0x55f3d5700ac5 - std::sys::unix::thread::Thread::new::thread_start::he469335aef763e45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/unix/thread.rs:108:17
  25:     0x7fe0a2856ac3 - <unknown>
  26:     0x7fe0a28e8850 - <unknown>
  27:                0x0 - <unknown>
stack backtrace:
[2024-09-05T18:58:24.942077750Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solUnpkSnpsht01" one=1i message="panicked at /home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45:
    called `Result::unwrap()` on an `Err` value: \"SendError(..)\"" location="/home/solana/solana-1.18.23/accounts-db/src/hardened_unpack.rs:338:45" version="1.18.23 (src:00000000; feat:4215500110, client:SolanaLabs)"
   0:     0x55f3d56fabdc - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x55f3d56fabdc - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55f3d56fabdc - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55f3d56fabdc - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x55f3d572a290 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:     0x55f3d572a290 - core::fmt::write::h3a39390d8560d9c9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:     0x55f3d56f5fff - std::io::Write::write_fmt::h5fc9997dfe05f882
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:     0x55f3d56fa9c4 - std::sys_common::backtrace::_print::h894006fb5c6f3d45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x55f3d56fa9c4 - std::sys_common::backtrace::print::h23a2d212c6fff936
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x55f3d56fc377 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
  10:     0x55f3d56fc0df - std::panicking::default_hook::h6038f2eba384e475
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
  11:     0x55f3d4e7da71 - solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}::hff8368754425ecb4
  12:     0x55f3d56fc988 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h1f8f335eaa9cfaee
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2021:9
  13:     0x55f3d56fc988 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:783:13
  14:     0x55f3d56fc6de - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  15:     0x55f3d56fb0a6 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  16:     0x55f3d56fc442 - rust_begin_unwind

These are the command line arguments that I'm using:

#!/bin/bash
exec /home/solana/solana-1.18.23/target/release/solana-validator \
--identity /home/solana/validator-keypair.json \
--entrypoint entrypoint.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint2.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint3.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint4.mainnet-beta.solana.com:8001 \
--entrypoint entrypoint5.mainnet-beta.solana.com:8001 \
--rpc-port 8899 \
--dynamic-port-range 8002-8099 \
--no-port-check \
--halt-on-trusted-validators-accounts-hash-mismatch \
--gossip-port 8001 \
--no-voting \
--private-rpc \
--rpc-bind-address 127.0.0.1 \
--wal-recovery-mode skip_any_corrupted_record \
--log /home/solana/solana-rpc.log \
--accounts /mnt/solana/accounts \
--ledger /mnt/solana/ledger/ledger \
--expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d \
--limit-ledger-size 50000000 \
--snapshot-interval-slots 500 \
--rpc-send-default-max-retries 3 \
--rpc-send-service-max-retries 3 \
--rpc-send-retry-ms 2000 \
--full-rpc-api \
--tpu-use-quic \
--known-validator PUmpKiNnSVAZ3w4KaFX6jKSjXUNHFShGkXbERo54xjb \
--known-validator Ninja1spj6n9t5hVYgF3PdnYz2PLnkt7rvaw3firmjs \
--known-validator ChorusmmK7i1AxXeiTtQgQZhQNiXYU84ULeaYF1EH15n \
--known-validator SerGoB2ZUyi9A1uBFTRpGxxaaMtrFwbwBpRytHefSWZ \
--known-validator FLVgaCPvSGFguumN9ao188izB4K4rxSWzkHneQMtkwQJ \
--known-validator qZMH9GWnnBkx7aM1h98iKSv2Lz5N78nwNSocAxDQrbP \
--known-validator LA1NEzryoih6CQW3gwQqJQffK2mKgnXcjSQZSRpM3wc \
--known-validator Certusm1sa411sMpV9FPqU5dXAYhmmhygvxJ23S6hJ24 \
--known-validator 9bkyxgYxRrysC1ijd6iByp9idn112CnYTw243fdH2Uvr \
--known-validator HEL1USMZKAL2odpNBj2oCjffnFGaYwmbGmyewGv1e2TU \
--known-validator CW9C7HBwAMgqNdXkNgFg9Ujr3edR2Ab9ymEuQnVacd1A \
--known-validator Fd7btgySsrjuo25CJCj7oE7VPMyezDhnx7pZkj2v69Nk \
--known-validator DWvDTSh3qfn88UoQTEKRV2JnLt5jtJAVoiCo3ivtMwXP \
--known-validator CXPeim1wQMkcTvEHx9QdhgKREYYJD8bnaCCqPRwJ1to1 \
--geyser-plugin-config /home/solana/yellowstone-grpc/yellowstone-grpc-geyser/config.json
@Nowasky Nowasky added the community Community contribution label Sep 5, 2024
Copy link
Contributor

github-actions bot commented Sep 5, 2024

This repository is no longer in use. Please re-open this issue in the agave repo: https://github.com/anza-xyz/agave

@github-actions github-actions bot closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution
Projects
None yet
Development

No branches or pull requests

1 participant