Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCache::RingBuffer initial scan dies at 0.0% #624

Open
businessbean opened this issue Aug 31, 2022 · 3 comments
Open

GCache::RingBuffer initial scan dies at 0.0% #624

businessbean opened this issue Aug 31, 2022 · 3 comments

Comments

@businessbean
Copy link

  • Ubuntu 20.04 plus MariaDB Galera packages from the mariadb.org repo
  • MariaDB 10.5.13 and Galera 26.4.8
  • gcache file in the MariaDB data partition

With ProxySQL in front of the MariaDB cluster routing all write traffic to the first database node i had started a database benchmark.

./dbbench mysql --iter 524288 --threads 16 --conns 8 --host mariadb-g-frontend.database.svc.cluster.local --user root --pass pass

After some time the cluster had died and the first data node is not able to restart because of the bootstrap fails during the gcache file scan. The disks still have plenty of free space, but the gcache seems to be corrupted because of the high load during the benchmark. I have later updated to MariaDB 10.5.17 and Galera 26.4.12, but it is also not able to read the gcache file. I can just delete the file to make the (test) cluster come up again, but it would be good to be able to validate the gcache before the bootstrap to be able to decide if the delete of the gcache is necessary. It also would be good if MariaDB could handle the problem gracefully.

mariadbd --defaults-file=/opt/mariadb/etc/my.cnf --basedir=/usr --wsrep-new-cluster

022-08-31  8:57:18 0 [Note] mariadbd (mysqld 10.5.17-MariaDB-1:10.5.17+maria~ubu2004-log) starting as process 27 ...
2022-08-31  8:57:18 0 [Note] WSREP: Loading provider /usr/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-08-31  8:57:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2022-08-31  8:57:18 0 [Note] WSREP: wsrep_load(): Galera 26.4.12(r1eac5b64) by Codership Oy <[email protected]> loaded successfully.
2022-08-31  8:57:18 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/galera/src/saved_state.cpp:SavedState():116: Found saved state: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39:1204230, safe_to_bootstrap: 1
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/gcache/src/gcache_rb_store.cpp:open_preamble():652: GCache DEBUG: opened preamble:
Version: 2
UUID: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39
Seqno: -1 - -1
Offset: -1
Synced: 0
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/gcache/src/gcache_rb_store.cpp:open_preamble():663: Recovering GCache ring buffer: version: 2, UUID: cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39, offset: -1
2022-08-31  8:57:18 0 [Note] WSREP: /home/buildbot/buildbot/build/galerautils/src/gu_progress.hpp:log():52: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
Killed

echo $?
137

grastate.dat:

# GALERA saved state
version: 2.1
uuid:    cfaa8cb8-1f9d-11ed-8d5e-6fd03ac1bd39
seqno:   1204230
safe_to_bootstrap: 1

my.cnf:

[mysqld]
# folders
plugin-dir=/usr/lib/mysql/plugin
datadir=/opt/mariadb/data
tmpdir=/opt/mariadb/tmp
ignore-db-dirs=lost+found
ignore-db-dirs=seqno
# performance monitoring
performance_schema=ON
performance-schema-instrument='stage/%=ON'
performance-schema-consumer-events-stages-current=ON
performance-schema-consumer-events-stages-history=ON
performance-schema-consumer-events-stages-history-long=ON

# process
pid-file=/opt/mariadb/run/mariadbd.pid
socket=/opt/mariadb/run/mariadbd.sock

[mysql_upgrade]
socket=/opt/mariadb/run/mariadbd.sock

[client]
socket=/opt/mariadb/run/mariadbd.sock

[client-server]
socket=/opt/mariadb/run/mariadbd.sock

[mariadb]
plugin_load_add = query_response_time #https://mariadb.com/kb/en/query-response-time-plugin/

# include additional configs
!includedir /opt/mariadb/etc/conf.d

conf.d/my.cnf:

[mariadb]
wsrep-provider=/usr/lib/libgalera_smm.so
binlog_format=ROW
log-bin=/opt/mariadb/log/mysql-bin.log
expire_logs_days=1
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep-cluster-name=eu-de-1.nova
wsrep_cluster_address=gcomm://mariadb-g-0.database.svc.cluster.local:4567,mariadb-g-1.database.svc.cluster.local:4567,mariadb-g-2.database.svc.cluster.local:4567,mariadb-g-backend.database.svc.cluster.local:4567
wsrep_provider_options=cert.log_conflicts=ON;debug=YES;gcache.recover=yes;ist.recv_addr=10.60.3.3:4568;pc.recovery=FALSE;pc.wait_prim_timeout=PT60S;pc.weight=4
wsrep_node_address=10.60.3.3
wsrep_node_name=mariadb-g-0
wsrep-on=1
wsrep_log_conflicts=ON
wsrep_slave_threads=16
@businessbean
Copy link
Author

PR #608 does not seem to fix the problem, because it also fails with 26.4.12.

@businessbean
Copy link
Author

The galera.cache.xz file

@businessbean
Copy link
Author

Galera ring buffer cache may get corrupted reported in the MariaDB Jira

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant