GCache::RingBuffer initial scan dies at 0.0% #624

businessbean · 2022-08-31T09:36:29Z

Ubuntu 20.04 plus MariaDB Galera packages from the mariadb.org repo
MariaDB 10.5.13 and Galera 26.4.8
gcache file in the MariaDB data partition

With ProxySQL in front of the MariaDB cluster routing all write traffic to the first database node i had started a database benchmark.

./dbbench mysql --iter 524288 --threads 16 --conns 8 --host mariadb-g-frontend.database.svc.cluster.local --user root --pass pass

After some time the cluster had died and the first data node is not able to restart because of the bootstrap fails during the gcache file scan. The disks still have plenty of free space, but the gcache seems to be corrupted because of the high load during the benchmark. I have later updated to MariaDB 10.5.17 and Galera 26.4.12, but it is also not able to read the gcache file. I can just delete the file to make the (test) cluster come up again, but it would be good to be able to validate the gcache before the bootstrap to be able to decide if the delete of the gcache is necessary. It also would be good if MariaDB could handle the problem gracefully.

mariadbd --defaults-file=/opt/mariadb/etc/my.cnf --basedir=/usr --wsrep-new-cluster

022-08-31  8:57:18 0 [Note] mariadbd (mysqld 10.5.17-MariaDB-1:10.5.17+maria~ubu2004-log) starting as process 27 ...
2022-08-31  8:57:18 0 [Note] WSREP: Loading provider /usr/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-08-31  8:57:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2022-08-31  8:57:18 0 [Note] WSREP: wsrep_load(): Galera 26.4.12(r1eac5b64) by Codership Oy <[email protected]> loaded successfully.
2022-08-31  8:57:18 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/galera/src/saved_state.cpp:SavedState():116: Found saved state: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39:1204230, safe_to_bootstrap: 1
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/gcache/src/gcache_rb_store.cpp:open_preamble():652: GCache DEBUG: opened preamble:
Version: 2
UUID: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39
Seqno: -1 - -1
Offset: -1
Synced: 0
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/gcache/src/gcache_rb_store.cpp:open_preamble():663: Recovering GCache ring buffer: version: 2, UUID: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39, offset: -1
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/galerautils/src/gu_progress.hpp:log():52: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
Killed

echo $?
137

grastate.dat:

# GALERA saved state
version: 2.1
uuid:    cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39
seqno:   1204230
safe_to_bootstrap: 1

my.cnf:

[mysqld]
# folders
plugin-dir=/usr/lib/mysql/plugin
datadir=/opt/mariadb/data
tmpdir=/opt/mariadb/tmp
ignore-db-dirs=lost+found
ignore-db-dirs=seqno
# performance monitoring
performance_schema=ON
performance-schema-instrument='stage/%=ON'
performance-schema-consumer-events-stages-current=ON
performance-schema-consumer-events-stages-history=ON
performance-schema-consumer-events-stages-history-long=ON

# process
pid-file=/opt/mariadb/run/mariadbd.pid
socket=/opt/mariadb/run/mariadbd.sock

[mysql_upgrade]
socket=/opt/mariadb/run/mariadbd.sock

[client]
socket=/opt/mariadb/run/mariadbd.sock

[client-server]
socket=/opt/mariadb/run/mariadbd.sock

[mariadb]
plugin_load_add = query_response_time #https://mariadb.com/kb/en/query-response-time-plugin/

# include additional configs
!includedir /opt/mariadb/etc/conf.d

conf.d/my.cnf:

[mariadb]
wsrep-provider=/usr/lib/libgalera_smm.so
binlog_format=ROW
log-bin=/opt/mariadb/log/mysql-bin.log
expire_logs_days=1
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep-cluster-name=eu-de-1.nova
wsrep_cluster_address=gcomm://mariadb-g-0.database.svc.cluster.local:4567,mariadb-g-1.database.svc.cluster.local:4567,mariadb-g-2.database.svc.cluster.local:4567,mariadb-g-backend.database.svc.cluster.local:4567
wsrep_provider_options=cert.log_conflicts=ON;debug=YES;gcache.recover=yes;ist.recv_addr=10.60.3.3:4568;pc.recovery=FALSE;pc.wait_prim_timeout=PT60S;pc.weight=4
wsrep_node_address=10.60.3.3
wsrep_node_name=mariadb-g-0
wsrep-on=1
wsrep_log_conflicts=ON
wsrep_slave_threads=16

The text was updated successfully, but these errors were encountered:

businessbean · 2022-08-31T13:33:09Z

PR #608 does not seem to fix the problem, because it also fails with 26.4.12.

businessbean · 2022-09-21T07:55:21Z

The galera.cache.xz file

businessbean · 2022-11-09T19:17:38Z

Galera ring buffer cache may get corrupted reported in the MariaDB Jira

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCache::RingBuffer initial scan dies at 0.0% #624

GCache::RingBuffer initial scan dies at 0.0% #624

businessbean commented Aug 31, 2022

businessbean commented Aug 31, 2022

businessbean commented Sep 21, 2022

businessbean commented Nov 9, 2022

GCache::RingBuffer initial scan dies at 0.0% #624

GCache::RingBuffer initial scan dies at 0.0% #624

Comments

businessbean commented Aug 31, 2022

businessbean commented Aug 31, 2022

businessbean commented Sep 21, 2022

businessbean commented Nov 9, 2022