Troubleshooting Galera Cluster – Quick Guide

How to troubleshoot Galera Cluster efficiently ? 


Basics – To troubleshoot Galera Cluster successfully  you should know how it works ?

Galera Cluster is built on top of a proprietary group communication system layer, which implements a virtual synchrony QoS (Quality of Service). Virtual synchrony unifies the data delivery and cluster membership services, providing clear formalism for message delivery semantics. While virtual synchrony guarantees consistency, it does not guarantee temporal synchrony, which is necessary for smooth multi-master operations. To address this, Galera Cluster implements its own runtime-configurable temporal flow control. Flow control keeps nodes synchronized to a fraction of a second. A minimal Galera cluster consists of 3 nodes and it is recommended to run with odd number of nodes. The reason is that, should there be a problem applying a transaction on one node (e.g., network problem or the machine becomes unresponsive), the two other nodes will have a quorum (i.e. a majority) and will be able to proceed with the transaction commit.

Galera Cluster and Certification based replication

When a transaction issues COMMIT, Before actual COMMIT happens, all the changes (INSERTs / UPDATEs / DELETEs) occurred at the DB node along with the modified rows “PRIMARY KEYS” are collected as WRITE-SET. The source node then broadcast this WRITE-SET to all the nodes in the cluster, including the originating node then performs a deterministic certification test on the write-set using the PRIMARY KEYS in the write-set and the actual PRIMARY KEY values in the nodes. This test is to determine the key constraint integrity of the write-set. If the test fails, the originating node drops the write-set and the cluster rolls back the original transaction. If the certification test succeeds, the transaction commits and write-sets are applied to all nodes in the cluster, thus making the replication.

In Galera Cluster all the nodes must reach on a consensus about the outcome of the transaction and replication happens. On this, the originating node notifies the client about the successful transaction commit. Each transaction is assigned a global ordinal sequence number. During the deterministic certification test on the write-set, the cluster checks the current transaction with the last successful transaction. If any transactions would have occurred in between these 2 globally ordered transactions, primary key conflicts will occur and the test fails.

Three layers of Galera Cluster Replication:

1. Certification layer that prepares the write-sets and performs the certification tests.
2. Replication layer that manages the replication process and global ordering.
3. Group communication framework provides plugin architecture for the group communication systems in the Cluster.

Collecting Galera Cluster Operations Forensics Data to troubleshoot efficiently

Every great technology solution need forensics / diagnostics tools for troubleshooting confidently. It’s technically impossible for the operations team to recommend or fix the issue without sufficient evidence  / facts. Galera Cluster errors are also logged-in MySQL, MariaDB and Percona XtraDB error log (by  default hostname.err  file in the data directory), depending on which you use The following Galera Cluster system variables can be configured to enable error logging specific to the replication  process:

  • wsrep_log_conflicts: Enables conflict logging for error logs, Example – Transaction conflicts  between two nodes on the same row of same table at the same time
  • cert.log_conflicts:Logs certification failures during replication
  • wsrep_debug:Enables debugging information for the database server logs ( This parameter also logs-in authentication details / passwords so we don’t recommend you to configure this system variable in the production )

Galera Cluster Status Variables and interpretation – This is often our first step to audit your Galera Cluster infrastructure

  1. wsrep_ready – Node status, Are they ready to accept queries ? Expected (ideal scenario) answer is ” ON “, which means can accept write-sets from the cluster and if ” OFF “, all queries fails with error – ERROR 1047 (08501) Unknown Command
  2. wsrep_connected – Confirms node’s network connectivity  with other nodes in the  cluster. If the output is “ON”, then respective node has a network connection to one or more other nodes forming a cluster component. If “OFF”, the node does not have a connection to any cluster components.

Using status variables for Galera Cluster Replication Health-Check

wsrep_local_recv_queue_avg: The average size of the local received queue since the last status query.  If the value is higher than 0.0, That specific node cannot apply write-sets as fast as it receives them ( indicates replication throttling or network throughput issues) and this can lead to replication latencies. To annotate the range you can also log the status variables wsrep_local_send_queue_min and wsrep_local_send_queue_max

wsrep_flow_controlled_paused: When you are processing really very large volume of data you may see a node is with unusually higher values of wsrep_flow_controlled_paused , for eg. , value a value of 0.2 indicates that flow control was in effect for 20% of the time since the last SHOW GLOBAL STATUS, This isn’t useful for us, because you may not know when that was, So we recommend you to flush occasionally (flush privileges) and closely monitor that node till you narrow down to the epicenter, If it’s not settling down, please consider increasing the number of slave threads (wsrep_slave_threads)

Restarting Galera Cluster Safely

When restarting Galera Cluster always identify the most advanced node, Restarting a cluster, starting nodes in the wrong order, starting the wrong nodes first, can be devastating and lead to loss of data. Attempting to boostrap using any other node will cause the following error message:

#source - Galera Cluster
2019-12-11 03:27:31 5572 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node.
It was not the last one to leave the cluster and may not contain all the updates.
To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

Identifying the Most Advanced Node

To identify the most advanced node state ID is done by comparing the Global Transaction ID values on each node in your cluster. You can find this in the grastate.dat file, located in the data directory for your database.

grastate.dat file looks like this copied below:

#source - Galera Cluster
# GALERA saved state
version: 2.1
uuid:    6rc24719-ct3q-11e2-f5w1-61sg152r2h81
seqno:   8157271294261
cert_index:

To find the sequence number of the last committed transaction, run mysqld with the –wsrep-recover option. This recovers the InnoDB table space to a consistent state, prints the corresponding Global Transaction ID value into the error log, and then exits. Here’s an example of this:

#source - Galera Cluster
130514 18:39:13 [Note] WSREP: Recovered position: 6rc24719-ct3q-11e2-f5w1-61sg152r2h81:8157271294261

How to identify crashed Galera nodes ?

You can identify  the crashed node by reading the contents of the grastate.dat file. If it looks like the example below, the node has either crashed during execution of a non-transactional operation (e.g., ALTER TABLE), or the node aborted due to a database inconsistency.

# GALERA saved state
version: 2.1
uuid:   6rc24719-ct3q-11e2-f5w1-61sg152r2h81
seqno:   -1
cert_index:

Safe to Bootstrap’ Protection (Galera Cluster 3.19 onwards)

Starting with provider version 3.19, Galera has an additional protection against attempting to boostrap the cluster using a node that may not have been the last node remaining in the cluster prior to cluster shutdown. If Galera can conclusively determine which node was the last node standing, it will be marked as ‘safe to bootstrap’, as seen in this example grastate.dat:

#source - Galera Cluster 
# GALERA saved state
version: 2.1
uuid:    5981f182-a4cc-11e6-98cc-77fabedd360d
seqno:   1234
safe_to_bootstrap: 1

In the case when all nodes crashed simultaneously, no node will be considered safe to bootstrap until the grastate.dat file is edited manually. To override this protection, edit the safe_to_bootstrap line in the grastate.dat file of the node you intend to use as the first node.

Galera Cluster Recovery with Gcache

Starting with provider version 3.19, Galera provides the gcache.recover parameter. If set to yes, Galera will attempt to recover the gcache on node startup. the node will be in position to provide IST to other joining nodes, which can speed up the overall restart time for the entire cluster. Gcache recovery requires that the entire gcache file is read twice so expect latency in larger database with slower disks. it is a “best effort” operation for your database infrastructure If the recovery was not successful, the node will continue to operate normally however other nodes will fall back to SST when attempting to join.

Conclusion

If you don’t carefully restart Galera Cluster nodes after a crash, You may end-up with a cluster of corrupted data and may not prove functional. If there are no other nodes in the cluster with a well-defined state, there is no need to preserve the node state ID. You must perform a thorough database recovery procedure, similar to that used on stand-alone database servers. Once you recover one node, use it as the first node in a new cluster.

References 

About MinervaDB Corporation 88 Articles
Independent and vendor neutral consulting, support, remote DBA services and training for MySQL, MariaDB, Percona Server, PostgreSQL and ClickHouse with core expertize in performance, scalability and high availability . We are an virtual corporation, all of us work from home on multiple timezones and stay connected via Email, Skype, Google Hangouts, Phone and IRC supporting over 250 customers worldwide
UA-155183614-1