How to troubleshoot Galera Cluster efficiently ?

Table of Contents

Basics – To troubleshoot Galera Cluster successfully you should know how it works ?

Galera Cluster is built on top of a proprietary group communication system layer, which implements a virtual synchrony QoS (Quality of Service). Virtual synchrony unifies the data delivery and cluster membership services, providing clear formalism for message delivery semantics. While virtual synchrony guarantees consistency, it does not guarantee temporal synchrony, which is necessary for smooth multi-master operations. To address this, Galera Cluster implements its own runtime-configurable temporal flow control. Flow control keeps nodes synchronized to a fraction of a second. A minimal Galera cluster consists of 3 nodes and it is recommended to run with odd number of nodes. The reason is that, should there be a problem applying a transaction on one node (e.g., network problem or the machine becomes unresponsive), the two other nodes will have a quorum (i.e. a majority) and will be able to proceed with the transaction commit.

Galera Cluster and Certification based replication

When a transaction issues COMMIT, Before actual COMMIT happens, all the changes (INSERTs / UPDATEs / DELETEs) occurred at the DB node along with the modified rows “PRIMARY KEYS” are collected as WRITE-SET. The source node then broadcast this WRITE-SET to all the nodes in the cluster, including the originating node then performs a deterministic certification test on the write-set using the PRIMARY KEYS in the write-set and the actual PRIMARY KEY values in the nodes. This test is to determine the key constraint integrity of the write-set. If the test fails, the originating node drops the write-set and the cluster rolls back the original transaction. If the certification test succeeds, the transaction commits and write-sets are applied to all nodes in the cluster, thus making the replication.

In Galera Cluster all the nodes must reach on a consensus about the outcome of the transaction and replication happens. On this, the originating node notifies the client about the successful transaction commit. Each transaction is assigned a global ordinal sequence number. During the deterministic certification test on the write-set, the cluster checks the current transaction with the last successful transaction. If any transactions would have occurred in between these 2 globally ordered transactions, primary key conflicts will occur and the test fails.

Three layers of Galera Cluster Replication:

1. Certification layer that prepares the write-sets and performs the certification tests.
2. Replication layer that manages the replication process and global ordering.
3. Group communication framework provides plugin architecture for the group communication systems in the Cluster.

Collecting Galera Cluster Operations Forensics Data to troubleshoot efficiently

Every great technology solution need forensics / diagnostics tools for troubleshooting confidently. It’s technically impossible for the operations team to recommend or fix the issue without sufficient evidence / facts. Galera Cluster errors are also logged-in MySQL, MariaDB and Percona XtraDB error log (by default hostname.err file in the data directory), depending on which you use The following Galera Cluster system variables can be configured to enable error logging specific to the replication process:

wsrep_log_conflicts: Enables conflict logging for error logs, Example – Transaction conflicts between two nodes on the same row of same table at the same time
cert.log_conflicts:Logs certification failures during replication
wsrep_debug:Enables debugging information for the database server logs ( This parameter also logs-in authentication details / passwords so we don’t recommend you to configure this system variable in the production )

Galera Cluster Status Variables and interpretation – This is often our first step to audit your Galera Cluster infrastructure

wsrep_ready – Node status, Are they ready to accept queries ? Expected (ideal scenario) answer is ” ON “, which means can accept write-sets from the cluster and if ” OFF “, all queries fails with error – ERROR 1047 (08501) Unknown Command
wsrep_connected – Confirms node’s network connectivity with other nodes in the cluster. If the output is “ON”, then respective node has a network connection to one or more other nodes forming a cluster component. If “OFF”, the node does not have a connection to any cluster components.

Using status variables for Galera Cluster Replication Health-Check

wsrep_local_recv_queue_avg: The average size of the local received queue since the last status query. If the value is higher than 0.0, That specific node cannot apply write-sets as fast as it receives them ( indicates replication throttling or network throughput issues) and this can lead to replication latencies. To annotate the range you can also log the status variables wsrep_local_send_queue_min and wsrep_local_send_queue_max

wsrep_flow_controlled_paused: When you are processing really very large volume of data you may see a node is with unusually higher values of wsrep_flow_controlled_paused , for eg. , value a value of 0.2 indicates that flow control was in effect for 20% of the time since the last SHOW GLOBAL STATUS, This isn’t useful for us, because you may not know when that was, So we recommend you to flush occasionally (flush privileges) and closely monitor that node till you narrow down to the epicenter, If it’s not settling down, please consider increasing the number of slave threads (wsrep_slave_threads)

Restarting Galera Cluster Safely

When restarting Galera Cluster always identify the most advanced node, Restarting a cluster, starting nodes in the wrong order, starting the wrong nodes first, can be devastating and lead to loss of data. Attempting to boostrap using any other node will cause the following error message:

#source - Galera Cluster
2019-12-11 03:27:31 5572 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node.
It was not the last one to leave the cluster and may not contain all the updates.
To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

Identifying the Most Advanced Node

To identify the most advanced node state ID is done by comparing the Global Transaction ID values on each node in your cluster. You can find this in the grastate.dat file, located in the data directory for your database.

grastate.dat file looks like this copied below:

#source - Galera Cluster
# GALERA saved state
version: 2.1
uuid:    6rc24719-ct3q-11e2-f5w1-61sg152r2h81
seqno:   8157271294261
cert_index:

To find the sequence number of the last committed transaction, run mysqld with the –wsrep-recover option. This recovers the InnoDB table space to a consistent state, prints the corresponding Global Transaction ID value into the error log, and then exits. Here’s an example of this:

#source - Galera Cluster
130514 18:39:13 [Note] WSREP: Recovered position: 6rc24719-ct3q-11e2-f5w1-61sg152r2h81:8157271294261

How to identify crashed Galera nodes ?

You can identify the crashed node by reading the contents of the grastate.dat file. If it looks like the example below, the node has either crashed during execution of a non-transactional operation (e.g., ALTER TABLE), or the node aborted due to a database inconsistency.

# GALERA saved state
version: 2.1
uuid:   6rc24719-ct3q-11e2-f5w1-61sg152r2h81
seqno:   -1
cert_index:

Safe to Bootstrap’ Protection (Galera Cluster 3.19 onwards)

Starting with provider version 3.19, Galera has an additional protection against attempting to boostrap the cluster using a node that may not have been the last node remaining in the cluster prior to cluster shutdown. If Galera can conclusively determine which node was the last node standing, it will be marked as ‘safe to bootstrap’, as seen in this example grastate.dat:

#source - Galera Cluster 
# GALERA saved state
version: 2.1
uuid:    5981f182-a4cc-11e6-98cc-77fabedd360d
seqno:   1234
safe_to_bootstrap: 1

In the case when all nodes crashed simultaneously, no node will be considered safe to bootstrap until the grastate.dat file is edited manually. To override this protection, edit the safe_to_bootstrap line in the grastate.dat file of the node you intend to use as the first node.

Galera Cluster Recovery with Gcache

Starting with provider version 3.19, Galera provides the gcache.recover parameter. If set to yes, Galera will attempt to recover the gcache on node startup. the node will be in position to provide IST to other joining nodes, which can speed up the overall restart time for the entire cluster. Gcache recovery requires that the entire gcache file is read twice so expect latency in larger database with slower disks. it is a “best effort” operation for your database infrastructure If the recovery was not successful, the node will continue to operate normally however other nodes will fall back to SST when attempting to join.

Conclusion

If you don’t carefully restart Galera Cluster nodes after a crash, You may end-up with a cluster of corrupted data and may not prove functional. If there are no other nodes in the cluster with a well-defined state, there is no need to preserve the node state ID. You must perform a thorough database recovery procedure, similar to that used on stand-alone database servers. Once you recover one node, use it as the first node in a new cluster.

References

The WebScale Database Infrastructure Operations Experts

Committed to Building Optimal, Scalable, Highly Available, Fault-Tolerant, Reliable and Secured WebScale Database Infrastructure Operations

Troubleshooting Galera Cluster – Quick Guide

How to troubleshoot Galera Cluster efficiently ?

Basics – To troubleshoot Galera Cluster successfully you should know how it works ?

Galera Cluster and Certification based replication

Three layers of Galera Cluster Replication:

Collecting Galera Cluster Operations Forensics Data to troubleshoot efficiently

Galera Cluster Status Variables and interpretation – This is often our first step to audit your Galera Cluster infrastructure

Using status variables for Galera Cluster Replication Health-Check

Restarting Galera Cluster Safely

Identifying the Most Advanced Node

How to identify crashed Galera nodes ?

Safe to Bootstrap’ Protection (Galera Cluster 3.19 onwards)

Galera Cluster Recovery with Gcache

Conclusion

How to troubleshoot Galera Cluster efficiently ?

Basics – To troubleshoot Galera Cluster successfully you should know how it works ?

Galera Cluster and Certification based replication

Three layers of Galera Cluster Replication:

Collecting Galera Cluster Operations Forensics Data to troubleshoot efficiently

Galera Cluster Status Variables and interpretation – This is often our first step to audit your Galera Cluster infrastructure

Using status variables for Galera Cluster Replication Health-Check

Restarting Galera Cluster Safely

Identifying the Most Advanced Node

How to identify crashed Galera nodes ?

Safe to Bootstrap’ Protection (Galera Cluster 3.19 onwards)

Galera Cluster Recovery with Gcache

Conclusion

Related Articles

Resetting the Galera Cluster Quorum

Things to remember while using Galera Cluster Streaming Replication