ProxySQL Query Cache

ProxySQL Query Cache – How it works ? 


In this blog post we have explained about ProxySQL query cache and how it creates value for MySQL query performance. Before we get deep in to ProxySQL query cache, Let us explain how MySQL implemented query cache for performance, The MySQL query cache (deprecated as of MySQL 5.7.20 and removed in MySQL 8.0 ) caches the text of a SELECT statement with corresponding results which was sent to the client. In the future when identical SELECT statement is received,  MySQL server retrieves the results from query cache rather than parsing and executing the statement again. Technically, MySQL query cache is a query results cache. It compare every incoming queries that start with SEL to a hash table, and if there is a match the results are returned directly from the query cache. A classic use case for a MySQL query cache is SELECT queries on e-commerce site with millions of articles in the inventory with complex listing rules for optimal SORT/SEARCH operations. But, MySQL 8.0 retired query cache due to following limitations:

  • The query cache works only if query match to byte-for-byte.
  • MySQL query cache doesn’t support stale results. i.e. Any modifications to table(s) will result in all cache invalidated for those tables
  • MySQL query cache conflicts with very MVCC implementation in InnoDB. If you are continuously updating records in InnoDB, then you are invalidating the query cache for ALL queries cached for that table. In such cases, You have no direct or indirect benefits with MySQL query cache.

How ProxySQL implemented Query Cache ?

In MySQL query cache, There exists common caching space for all the tables and in ProxySQL we have to configure which traffic should be cached. You can enable query cache in ProxySQL by defining query rules that match the traffic which eventually will be cached by setting cache_ttl for the cached results. You can define matches for the incoming traffic either by query or ProxySQL query digest patterns. Whenever the query successfully passes the matching criteria, the resultset is cached for the future queries to be replied directly from the ProxySQL instance than from DB instance of the hostgroup (this is possible only if the cache_ttl is not expired)

ProxySQL Query Cache is a in-memory key-value storage which retains a metadata in the pattern explained below:

  • Key: The “Key” is a combination of actually username, schema and query text which is derived from the actual query itself, This will ensure users access only their respective resultsets for the appropriate schema.
  • Value: The “Value” is the result returned from the backend, This could be mysqld or some other proxy.

Configuring ProxySQL for query caching

If you are first timer with ProxySQL and want to lean installation (it’s out of the scope of this blog so we haven’t explained about ProxySQL installation here), We strongly recommend you to follow steps here:  https://github.com/sysown/proxysql/wiki#getting-started  . Before doing any benchmarks, I will try to give you a short description of how ProxySQL query cache gets enabled. Unlike MySQL query cache, where a common caching space exists for all tables, in ProxySQL query cache we have to define what traffic gets cached. This is done by defining query rules that match traffic that is going to be cached and setting a “cache_ttl” for the cached results. There are many ways to define matches for incoming traffic, either by query or digest using patterns. All we need to cache the resultset is to define the matching criteria and the TTL. If a query passed the matching criteria, the resultset is cached so the next requests are served directly from the ProxySQL instance instead of querying the hostgroup DB nodes (if cache_ttl has not expired).

Step-by-step configuration of ProxySQL query cache

We have a full operational ProxySQL step with 3 instances of Percona Server for MySQL 5.7 and a ProxySQL instance with Sysbench instance installed. Database instances are configured in same hostgroup with traffic load-balanced equally with same priority. Let’s confirm all the servers are going good by querying from ProxySQL admin:

ProxySQLAdmin> select * from mysql_servers;
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname  | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 2            | 88.8.6.32 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | 88.8.6.31 | 3306 | ONLINE | 1      | 0       	  | 1000            | 0               	  | 0       | 0              |     	|
| 2            | 88.8.6.33 | 3306 | ONLINE | 1      | 0       	  | 1000            | 0               	  | 0       | 0              |     	|
| 2            | 88.8.6.31 | 3306 | ONLINE | 1      | 0       	  | 1000            | 0               	  | 0       | 0              |     	|
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

If you notice above results, You can see there are two host groups configured:

  • Hostgroup “1”  is for WRITES
  • Hostgroup “2” is for READS

Before using sysbench, Let’s check the “stats_mysql_query_digest” table:

ProxySQLAdmin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
Empty set (0.00 sec)

ProxySQL query digests are stored in “stats_mysql_query_digest” table.

To explain the ProxySQL query cache, We have created some data with Sysbench and this benchmarking activity will do selects by Primary Key or RANGE:

sysbench --threads=64 --max-requests=0 --time=180 --mysql-user=mdblab --mysql-password=mdblabinfra --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=250000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=1 --oltp-simple-ranges=0 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run

sysbench --threads=64 --max-requests=0 --time=180 --mysql-user=mdblab --mysql-password=mdblabinfra --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=250000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=0 --oltp-simple-ranges=1 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run

The benchmark above is done with ProxySQL query cache disabled. Now let us show you what is recorded in “stats_mysql_query_digest” table: 

ProxySQLAdmin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
+------------+-------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time    | hostgroup | digest             | digest_text                                    |
+------------+-------------+-----------+--------------------+------------------------------------------------+
| 517314     | 51813612616 | 2         | 0x7721D69250CB40   | SELECT c FROM sbtest1 WHERE id=?               |
| 8173       | 71351732    | 2         | 0x3BC2F7549D058B6F | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+-------------+-----------+--------------------+------------------------------------------------+
2 row in set (0.01 sec)

Update MySQL queries to be cached

We are updating table mysql_query_rules to match the results that should be cached. In this example, We have used matching criteria and a cache_ttl of 15000ms, So we have added rules below:

ProxySQLAdmin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (1,1,'0x7721D69250CB40',15000,1);
Query OK, 1 row affected (0.00 sec)

ProxySQLAdmin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (2,1,'0x3BC2F7549D058B6F',15000,1);
Query OK, 1 row affected (0.00 sec

P.S.: Please don’t forget to load the query rules at runtime. If you miss that step, ProxySQL will lose these rules with a restart so we should also this to disk:

ProxySQLAdmin> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

We will reset the stats_mysql_query_digest table to evaluate ProxySQL query cache:

ProxySQLAdmin> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
----------

Now we are re-running the benchmarking with query cache enabled, We can confirm the traffic was cached querying the stats_mysql_query_digest table:

ProxySQLAdmin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time   | hostgroup | digest             | digest_text                                    |
+------------+------------+-----------+--------------------+------------------------------------------------+
| 715815     | 9513175314 | 2         | 0x7721D69250CB40   | SELECT c FROM sbtest1 WHERE id=?               |
| 835193     | 0          | -1        | 0x7721D69250CB40   | SELECT c FROM sbtest1 WHERE id=?               |
+------------+------------+-----------+--------------------+------------------------------------------------+
2 rows in set (0.00 sec)
ProxySQLAdmin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+-----------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time  | hostgroup | digest             | digest_text                                    |
+------------+-----------+-----------+--------------------+------------------------------------------------+
| 58136      | 857050510 | 2         | 0x3BC2F7549D058B6F | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
| 182175     | 0         | -1        | 0x3BC2F7549D058B6F | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+-----------+-----------+--------------------+------------------------------------------------+
2 rows in set (0.00 sec)

If you notice above the results, the cached queries will be marked with a special hostgroup “-1” (these queries were not sent to any hostgroup) and total execution time for the cached queries is “0

Conclusion

ProxySQL query cache is definitely a instant gratification performance booster for MySQL query performance (though it doesn’t support prepared statements). You can tune the total amount of memory used by the ProxySQL query cache by setting variable –mysql-query_cache_size_MB and please note, the current implementation of mysql-query_cache_size_MB doesn’t impose any hard limit, the system variable –mysql-threshold_resultset_size controls the maximum resultset size the ProxySQL will buffer before it start sending to the client, We recommend to carefully tweak the variable –mysql-threshold_resultset_size for performance because setting it too low will arrest the reattempt of queries failed while retrieving the resultset from the backend DB node and setting it too high will make ProxySQL a memory intensive infrastructure to manage.

References:

UA-155183614-1