MySQL Consulting

ClickHouse and ProxySQL queries rewrite (Cross-post from ProxySQL)

MinervaDB Corporation — Thu, 26 Dec 2019 05:06:03 +0000

MySQL query rewrite for ClickHouse using ProxySQL

Introduction

ProxySQL in September 2017 announced support for ClickHouse as backend. ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks. ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. To support ClickHouse as a backend, ProxySQL acts as a data bridge between MySQL protocol and ClickHouse protocol, allowing MySQL clients to execute queries in ClickHouse through it. ClickHouse’s SQL query syntax is different than MySQL’s syntax, and migrating application from MySQL to ClickHouse isn’t just a matter of changing connections endpoint but it also requires modifying some queries. This needs development time, but not always possible. One of ProxySQL most widely used feature is indeed the ability of rewriting queries, so often it is just a matter of writing the right query rules. In this blog post we have explained step-by-step MySQL query rewrite for ClickHouse using ProxySQL:

How to implement ProxySQL query rewrite for ClickHouse ?

The below is MySQL query we are considering for query rewrite:

SELECT COUNT(`id`), FROM_UNIXTIME(`created`, '%Y-%m') AS `date` FROM `tablename` GROUP BY FROM_UNIXTIME(`created`, '%Y-%m')

ClickHouse doesn’t support FROM_UNIXTIME, but it supports toDate and toTime.
ClickHouse also supports toYear and toMonth , useful to format the date the same FROM_UNIXTIME does.

Therefore, it is possible to rewrite the query as:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

To perform the above rewrite, we will need two rules, one for the first FROM_UNIXTIME, and one for the second one. Or we can just use one rewrite rules to replace FROM_UNIXTIME(created, ‘%Y-%m’) no matter if on the retrieved fields or in the GROUP BY clause, generating the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created))));

Does it look great? No, not yet!
For the month of March, concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) will return 2018-3 : not what the application was expecting, as MySQL would return 2018-03 . The same applies for all the first 9 months of each year.
Finally, we rewrote the query as the follow, and the application was happy:

SELECT COUNT(`id`), substring(toString(toDate(created)),1,7) AS `date`
FROM `tablename`
GROUP BY substring(toString(toDate(created)),1,7);

Note: because of the datatypes conversions that ClickHouse needs to perform in order to execute the above query, its execution time is about 50% slower than executing the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

Architecting using two ProxySQL

Great, we now know how to rewrite the query!
Although, the ClickHouse module in ProxySQL doesn’t support query rewrite. The ClickHouse module in ProxySQL is only responsible to transform data between MySQL and ClickHouse protocol, and viceversa.

Therefore the right way of achieving this solution is to configure two ProxySQL layers, one instance responsible for rewriting the query and sending the rewritten query to the second ProxySQL instance, this one responsible for executing the query (already modified) on ClickHouse.

Architecting using only one ProxySQL

Does the above architecture seems complex? Not really, it is reasonable straightforward.
Can it be improved?
As you can see from the previous chart, the ClickHouse module and the MySQL module listen on different ports. The first ProxySQL instance is receiving traffic on port 6033, and sending traffic on the second PorxySQL instance on port 6090.
Are two instances really required? The answer is no.
In fact, a single instance can receive MySQL traffic on port 6033, rewrite the query, and send the rewritten query to itself on port 6090, to finally execute the rewritten query on ClickHouse.

This diagram describes the architecture:

Configuration

For reference, below is the step to configure one single ProxySQL to send traffic to ClickHouse, and use itself as a backend.

Create ClickHouse user:

INSERT INTO clickhouse_users (username,password) VALUES ('clicku','clickp');
LOAD CLICKHOUSE USERS TO RUNTIME;
SAVE CLICKHOUSE USERS TO DISK;

Create MySQL user (same as ClickHouse):

INSERT INTO mysql_users(username,password) SELECT username, password FROM clickhouse_users;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;

Configure ProxySQL itself as a backend for MySQL traffic:

INSERT INTO mysql_servers(hostname,port) VALUES ('127.0.0.1',6090);
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;

Create a query rule for rewriting queries:

INSERT INTO mysql_query_rules (active,match_pattern,replace_pattern,re_modifiers) VALUES 
(1,"FROM_UNIXTIME\(`created`, '%Y-%m'\)", 'substring(toString(toDate(created)),1,7)',"CASELESS,GLOBAL");
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

This is a very simple example to demonstrate how to perform query rewrite from MySQL to ClickHouse using just one ProxySQL instance. In a real world scenarios you will need to create more rules based on your own queries.

Conclusion

Not only ProxySQL allows to send queries to ClickHouse, but it also allows to rewrite queries to solve issues related to different SQL syntax and available functions.
To achieve this, ProxySQL uses its ability to use itself as a backend: rewrite the query in the MySQL module, and execute it in the ClickHouse module.

References

The post ClickHouse and ProxySQL queries rewrite (Cross-post from ProxySQL) appeared first on The WebScale Database Infrastructure Operations Experts.

How to use ProxySQL to work on ClickHouse like MySQL ?

MinervaDB Corporation — Wed, 25 Dec 2019 19:45:42 +0000

Use ClickHouse like MySQL with ProxySQL

Introduction

We have several customers on ClickHouse now for both columnar database analytics and archiving MySQL data, You can access data from ClickHouse with clickhouse-client but this involves some learning and also limitations technically. Our customers are very comfortable using MySQL so they always preferred a MySQL client for ClickHouse query analysis and reporting, Thankfully ProxySQL works as a optimal bridge between ClickHouse and MySQL client, This indeed was a great news for us and our customers worldwide. This blog post is about how we can use MySQL client with ClickHouse.

Installation

https://github.com/sysown/proxysql/releases/ (**Download the package starting with clickhouse )
Dependencies installation:
- yum -y install perl-DBD-MySQL

Start ProxySQL once completed installation successfully.

  # The default configuration file is this: 
  /etc/proxysql.cnf 
  # There is no such data directory by default: 
  mkdir / var / lib / proxysql 
  # start up 
  proxysql --clickhouse-server 
  # ProxySQL will default to daemon mode in the background

Creating ClickHouse user

Create a user for ClickHouse in the ProxySQL with password, The password is not configured for ClickHouse but for accessing ProxySQL:

# ProxySQL port is 6032, the default username and password are written in the configuration file 
 root@10.xxxx: / root # mysql -h 127.0.0.1 -P 6032 -uadmin -padmin 
 Welcome to the MariaDB monitor. Commands end with; or \ g. 
 Your MySQL connection id is 3 
  Server version: 5.6.81 (ProxySQL Admin Module) 
  Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. 
  Type 'help;' or '\ h' for help. Type '\ c' to clear the current input statement. 
  MySQL [(none)]> INSERT INTO clickhouse_users VALUES ('chuser', 'chpasswd', 1,100); 
  Query OK, 1 row affected (0.00 sec) 
  MySQL [(none)] > select * from clickhouse_users; 
  + ---------- + ---------- + -------- + ----------------- + 
  | username | password | active | max_connections | 
  + ---------- + ---------- + -------- + ----------------- + 
  | chuser | chpasswd | 1 | 100 | 
  + ---------- + ---------- + -------- + ----------------- + 
  1 row in set (0.00 sec) 
  MySQL [(none)]> LOAD CLICKHOUSE USERS TO RUNTIME; 
  Query OK, 0 rows affected (0.00 sec) 
  MySQL [(none)]> SAVE CLICKHOUSE USERS TO DISK; 
  Query OK, 0 rows affected (0.00 sec)

Connecting to ClickHouse from MySQL Client

By default ProxySQL opens the port 6090 to receive user access to ClickHouse:

  # Use username and password above 
  # If it is a different machine, remember to change the IP 
  root@10.xxxx: / root # mysql -h 127.0.0.1 -P 6090 -uclicku -pclickp --prompt "ProxySQL-To-ClickHouse>" 
  Welcome to the MariaDB monitor. Commands end with; or \ g. 
  Your MySQL connection id is 64 
  Server version: 5.6.81 (ProxySQL ClickHouse Module) 
  Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. 
  Type 'help;' or '\ h' for help. Type '\ c' to clear the current input statement. 
  ProxySQL-To-ClickHouse >

Querying ClickHouse like MySQL

 MySQL [(none)] > select version (); 
+ ------------------- + 
| version | 
+ ------------------- + 
| 5.6.81-clickhouse | 
+ ------------------- + 
1 row in set (0.00 sec) 
MySQL [(none)] > select now (); 
+ --------------------- + 
| now () | 
+ --------------------- + 
| 2019-12-25 20:17:14 | 
+ --------------------- + 
1 row in set (0.00 sec) 
MySQL [(none)] > select today (); 
+ ------------ + 
| today () | 
+ ------------ + 
| 2019-12-25 | 
+ ------------ + 
1 row in set (0.00 sec) 
# Our table is over 55 billion 
ProxySQL-To-ClickHouse > select count (*) from mysql_audit_log_data; 
+ ------------- + 
| count () | 
+ ------------- + 
| 539124837571 | 
+ ------------- + 
1 row in set (8.31 sec)

Limitations

This ProxySQL solution works only when it is on the local ClickHouse (Note.- ClickHouse instance cannot have password in this ecosystem / recommended solution)
ProxySQL query rewrite limitations – The simple queries work seamless, The complex query rewrite are quite expense and there might some levels of SQL semantics limitations

Conclusion – ProxySQL Version 2.0.8 new features and enhancements

Changed default max_allowed_packet from 4M to 64M
Added support for mysqldump 8.0 and Admin #2340
Added new variable mysql-aurora_max_lag_ms_only_read_from_replicas : if max_lag_ms is used and the writer is in the reader hostgroup, the writer will be excluded if at least N replicas are good candidates.
Added support for unknown character set , and for collation id greater than 255 #1273
Added new variable mysql-log_unhealthy_connections to suppress messages related to unhealty clients connections being closed
Reimplemented rules_fast_routing using khash
Added support for SET CHARACTERSET #1692
Added support for same node into multiple Galera clusters #2290
Added more verbose output for error 2019 (Can’t initialize character set) #2273
Added more possible values for mysql_replication_hostgroups.check_type #2186
- read_only | innodb_read_only
- read_only & innodb_read_only
Added support and packages for RHEL / CentOS 8

References:

The post How to use ProxySQL to work on ClickHouse like MySQL ? appeared first on The WebScale Database Infrastructure Operations Experts.