How to migrate from Galera Cluster to MySQL Group Replication

In this article, I will show you how it’s possible to perform an online migration from a 3 members Galera cluster setup (in this case I’m using PXC 5.7.14) to a 3 members MySQL Group Replication cluster setup (MySQL Community 5.7.17).

Don’t forget that before adopting Group Replication as database backend, you should validate that your application do match GR requirements and limitations. When this is validated, you can start !

So first, let’s have a look at the current situation:

 

We have an application (sysbench 0.5), reading and writing to a Galera Cluster (Percona XtraDB Cluster 5.7.14) via ProxySQL. We write on all the nodes (Multi-Master) and we will do the same on our MySQL Group Replication Cluster, we will use a Multi-Primary Group.

This is the command used to simulate our application:

while true; do sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua \
     --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-password=fred \
     --mysql-table-engine=innodb --mysql-user=fred --max-requests=0 \
     --tx-rate=20 --num-threads=2  --report-interval=1 run ; done;

And this an overview of the machines used:

Hostname OS Software IP Server_id
mysql1 CentOS 7 PXC 5.7.14 192.168.90.10 1
mysql2 CentOS 7 PXC 5.7.14 192.168.90.11 2
mysql3 CentOS 7 PXC 5.7.14 192.168.90.12 3
app CentOS 7 ProxySQL
sysbench 0.5
192.168.90.13 n/a

So the goal will be to replace all those PXC nodes by MySQL 5.7.17 with Group Replication one by one and avoid downtime.

For those familiar with ProxySQL this is how we see the Galera nodes in the proxy:

ProxySQL Admin> select hostname, status, hostgroup_id from runtime_mysql_servers;
+---------------+--------+--------------+
| hostname      | status | hostgroup_id |
+---------------+--------+--------------+
| 192.168.90.12 | ONLINE | 1            |
| 192.168.90.10 | ONLINE | 1            |
| 192.168.90.11 | ONLINE | 1            |
+---------------+--------+--------------+

For the other ones, you can find more info in the previous post: HA with MySQL Group Replication and ProxySQL.

To be able to proceed as we planned, we need to have binary logs enabled on every PXC nodes and also use MySQL GTIDs.
So in my.cnf you must have:

enforce_gtid_consistency = on
gtid_mode  = on
log_bin 
log_slave_updates

First Step: remove one node and migrate it to MySQL 5.7.17

Our first step in this section will be to stop mysqld and remove the PXC packages on mysql3:

[root@mysql3]# systemctl stop mysql

ProxySQL Admin> select hostname, status from runtime_mysql_servers;
+---------------+---------+
| hostname      | status  |
+---------------+---------+
| 192.168.90.11 | ONLINE  |
| 192.168.90.12 | SHUNNED |
| 192.168.90.10 | ONLINE  |
+---------------+---------+

Our application is of course still running (it might of course be disconnected), so in this case sysbench runs in a loop.

As all our nodes are running CentOS 7, we will use the mysql57 community repo for el7.

[root@mysql3 ~]# yum install http://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

Now we can change the packages:

[root@mysql3 ~]# yum -y swap Percona-XtraDB-Cluster* mysql-community-server mysql-community-libs-compat
...
=========================================================================================================
 Package                                   Arch     Version              Repository                 Size
=========================================================================================================
Installing:
 mysql-community-libs-compat               x86_64   5.7.17-1.el7         mysql57-community         2.0 M
 mysql-community-server                    x86_64   5.7.17-1.el7         mysql57-community         162 M
Removing:
 Percona-XtraDB-Cluster-57                 x86_64   5.7.14-26.17.1.el7   @percona-release-x86_64   0.0  
 Percona-XtraDB-Cluster-client-57          x86_64   5.7.14-26.17.1.el7   @percona-release-x86_64    37 M
 Percona-XtraDB-Cluster-server-57          x86_64   5.7.14-26.17.1.el7   @percona-release-x86_64   227 M
 Percona-XtraDB-Cluster-shared-57          x86_64   5.7.14-26.17.1.el7   @percona-release-x86_64   3.7 M
 Percona-XtraDB-Cluster-shared-compat-57   x86_64   5.7.14-26.17.1.el7   @percona-release-x86_64   6.7 M
Installing for dependencies:
 mysql-community-client                    x86_64   5.7.17-1.el7         mysql57-community          24 M
 mysql-community-common                    x86_64   5.7.17-1.el7         mysql57-community         271 k
 mysql-community-libs                      x86_64   5.7.17-1.el7         mysql57-community         2.1 M

Transaction Summary
=========================================================================================================
Install  2 Packages (+3 Dependent packages)
Remove   5 Packages

After that step, it’s time to modify my.cnf and comment all wsrep and pxc related variables and add some extra that are mandatory:

binlog_checksum = none
master_info_repository = TABLE
relay_log_info_repository = TABLE
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name="afb80f36-2bff-11e6-84e0-0800277dd3bf"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.90.12:3406"
loose-group_replication_group_seeds= "192.168.90.10:3406,192.168.90.11:3406"
loose-group_replication_bootstrap_group= off
loose-group_replication_single_primary_mode= off

Then we move that server to another hostgroup in ProxySQL:

ProxySQL Admin> update mysql_servers set hostgroup_id =2 where hostname ="192.168.90.12";
ProxySQL Admin> load mysql servers to runtime;
ProxySQL Admin> select hostname, status, hostgroup_id from runtime_mysql_servers;
+---------------+---------+--------------+
| hostname      | status  | hostgroup_id |
+---------------+---------+--------------+
| 192.168.90.11 | ONLINE  | 1            |
| 192.168.90.10 | ONLINE  | 1            |
| 192.168.90.12 | SHUNNED | 2            |
+---------------+---------+--------------+

It’s time now to start mysqld:

[root@mysql3 ~]# systemctl start mysqld
[root@mysql3 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.17-log MySQL Community Server (GPL)
...

Step 2: create Group Replication Cluster of 1 node

Now we need to bootstrap our group.

This is a very easy step:

mysql3> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
mysql3> SET GLOBAL group_replication_bootstrap_group=ON;
mysql3> START GROUP_REPLICATION;
mysql3> SET GLOBAL group_replication_bootstrap_group=OFF;

Now the Group is started:

mysql3> select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: 9e8416d7-b1c6-11e6-bc10-08002718d305
 MEMBER_HOST: mysql3.localdomain
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE

It’s not needed now, but it’s the best time to also setup the credentials for the recovery process. You might forget it.

mysql1> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='password' 
             FOR CHANNEL 'group_replication_recovery';

This user is not yet created but we will created directly during the next step (and it will be replicated to all nodes).

Step 3: make this MySQL 5.7.17 replicate from PXC

So now we need to create a replication user on the Galera cluster that we will use for this new MySQL 5.7 server (and later for the Group Replication’s recovery process):

mysql1> CREATE USER 'repl'@'192.168.90.%' IDENTIFIED BY 'password';
mysql1> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.90.%';

And we can start asynchronous replication from the Galera Cluster to this new MySQL server:

mysql3> CHANGE MASTER TO MASTER_HOST="192.168.90.10", MASTER_USER="repl", 
        MASTER_PASSWORD='password', MASTER_AUTO_POSITION=1;
mysql3> START SLAVE;

Now we have the following environment:

 

Step 4: migrate an extra node to the Group

Now we will almost do the same with mysql2:

  1. stop mysql
  2. install mysql community repository
  3. swap the packages
  4. modify my.cnf
  5. put mysql2 in hostgroup_id 2 in ProxySQL
  6. start mysqld
  7. join the group

Let’s skip points 1 to 4.

Unlike Galera, it’s mandatory with MySQL Group Replication that all the nodes have a unique server_id. We must then be careful, in this case we will set it to 2.

Don’t forget to also swap the addresses for mysql3 and mysql2 between group_replication_local_address and group_replication_group_seeds:

loose-group_replication_local_address= "192.168.90.11:3406"
loose-group_replication_group_seeds= "192.168.90.10:3406,192.168.90.12:3406"

Put mysql2 in hostgroup_id 2 in ProxySQL, so you have:

ProxySQL Admin> select hostname, status, hostgroup_id from runtime_mysql_servers;
+---------------+---------+--------------+
| hostname      | status  | hostgroup_id |
+---------------+---------+--------------+
| 192.168.90.10 | ONLINE  | 1            |
| 192.168.90.12 | ONLINE  | 2            |
| 192.168.90.11 | SHUNNED | 2            |
+---------------+---------+--------------+

Start mysqld and it’s time to join the group (point 7) !

mysql2> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
mysql2> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='password' 
        FOR CHANNEL 'group_replication_recovery';
mysql2> START GROUP_REPLICATION;

So this is the new situation:

mysql3> select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: 5221ffcf-c1e0-11e6-b1f5-08002718d305
 MEMBER_HOST: pxc2
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: 5a2d38db-c1e0-11e6-8bf6-08002718d305
 MEMBER_HOST: pxc3
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE

Step 5: move the application to our new Cluster

It’s time now to let the application connect to our new MySQL Group Replication Cluster. In ProxySQL, we change the hostgroup_id for mysql2 and mysql3 to 1 and to 2 for mysql1, then we load it runtime and we stop mysql on mysql1:

ProxySQL Admin> select hostname, status, hostgroup_id from runtime_mysql_servers;
+---------------+--------+--------------+
| hostname      | status | hostgroup_id |
+---------------+--------+--------------+
| 192.168.90.10 | ONLINE | 1            |
| 192.168.90.12 | ONLINE | 2            |
| 192.168.90.11 | ONLINE | 2            |
+---------------+--------+--------------+
ProxySQL Admin> update mysql_servers set hostgroup_id =2 where hostname ="192.168.90.10";
ProxySQL Admin> update mysql_servers set hostgroup_id =1 where hostname ="192.168.90.11";
ProxySQL Admin> update mysql_servers set hostgroup_id =1 where hostname ="192.168.90.12";

ProxySQL Admin> load mysql servers to runtime;

ProxySQL Admin> select hostname, status, hostgroup_id from runtime_mysql_servers;
+---------------+--------+--------------+
| hostname      | status | hostgroup_id |
+---------------+--------+--------------+
| 192.168.90.12 | ONLINE | 1            |
| 192.168.90.11 | ONLINE | 1            |
| 192.168.90.10 | ONLINE | 2            |
+---------------+--------+--------------+

In this case as we are using ProxySQL, as soon as mysql1 (192.168.90.10) changes group, all the connection to it are killed and they will reconnect to the new nodes that are now part of the MySQL Group Replication.

[root@mysql1 ~]# systemctl stop mysql

To finish, we have two options, or we configure the remaining PXC node to be slave for a while, so we can still decide to rollback the migration (I would then consider to add an extra node to the current MySQL Group Replication Cluster, as with 2 nodes, the cluster is not tolerant to any failure). Or we can directly migrate the last Galera node to Group Replication.

Conclusion

As you could see, migrate your current Galera environment to MySQL Group Replication is not complicated and can be done with really minimal impact.

Don’t hesitate to leave your comments or questions like usual.

 

Note:

It’s also possible to swap Setp 2 and Step3, this means that the asynchronous replication is started before the bootstrap of the Group Replication. In that case, it might happen that the asynchronous replication fails while starting Group Replication, as the replication recovery is started and therefore no transaction can be executed.

You can see the following in SHOW SLAVE STATUS:

Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 
'Error on observer while running replication hook 'before_commit'.'

The error log also gives you information about it:

[Note] Plugin group_replication reported: 'Starting group replication recovery 
       with view_id 14817109352506883:1'
[Note] Plugin group_replication reported: 'Only one server alive. 
       Declaring this server as online within the replication group'
[ERROR] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication 
        is recovering. Try again when the server is ONLINE.'
[ERROR] Run function 'before_commit' in plugin 'group_replication' failed
[ERROR] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 
        'Error on observer while running replication hook 'before_commit'.', Error_code: 3100
[Warning] Slave: Error on observer while running replication hook 'before_commit'. Error_code: 3100
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart 
        the slave SQL thread with "SLAVE START". We stopped at log 'pxc1-bin.000009' position 76100764
[Note] Plugin group_replication reported: 'This server was declared online within the replication group'

Restarting again replication solves the problem:

mysql3> STOP SLAVE;
mysql3> START SLAVE;

 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

8 Comments

  1. Hi frederic,
    The article is great. But I have concern at the value of loose-group_replication_group_seeds. When you set the node 3(192.168.90.12, first node to MGR), 192.168.90.10 and 192.168.90.11 was still PXC. I thought the loose-group_replication_group_seeds should be “192.168.90.12:3406”.
    loose-group_replication_local_address= “192.168.90.12:3406”
    loose-group_replication_group_seeds= “192.168.90.10:3406,192.168.90.11:3406”
    Looking forward to your feedback on this.

    • Hi Janet,
      It doesn’t matter, at that time the setting is not really used as we bootstrap the Group from that node. And in case it reboots, it will try to connect to the other nodes in that list, none will be available and a bootstrap will be necessary anyway. The setting it’s used at boot to join a Group.
      Cheers,

  2. Hi, Frederic
    I’m now migrating my pxc to mgr , but one problem is that mgr can’t catch up with master(pxc).
    my pxc version is 5.7.14. I found a [blog](https://www.percona.com/blog/2017/04/19/how-we-made-percona-xtradb-cluster-scale/) it seems like that pxc5.7.14 is not support group commit very well. is there any way to optimize replication latency? here is my my.cnf

    ###BASIC SETTINGS
    server-id = 0040263307
    user = mysql
    port = 3307
    basedir = /usr/local/mysql
    datadir = /data/mysql_3307/data
    socket = /data/mysql_3307/tmp/mysql.sock
    pid-file = /data/mysql_3307/tmp/node004026.pid
    character-set-server = utf8mb4
    transaction_isolation = READ-COMMITTED
    explicit_defaults_for_timestamp = 1
    max_allowed_packet = 64M
    open_files_limit = 65535
    sql_mode = NO_ENGINE_SUBSTITUTION

    ###CONNECTION SETTINGS
    interactive_timeout = 30
    wait_timeout = 30
    lock_wait_timeout = 1800
    skip_name_resolve = 1
    max_connections = 2560
    max_user_connections=2048
    max_connect_errors = 1000000
    back_log = 1024

    ###TABLE CACHE PERFORMANCE SETTINGS
    table_open_cache = 2048
    table_definition_cache = 1024
    table_open_cache_instances = 64

    ###SESSION MEMORY SETTINGS
    read_buffer_size = 16M
    read_rnd_buffer_size = 32M
    sort_buffer_size = 32M
    join_buffer_size = 32M
    tmp_table_size = 64M
    max_heap_table_size = 64M
    thread_cache_size = 64
    query_cache_size = 0
    query_cache_type = 0
    key_buffer_size = 16M
    max_length_for_sort_data = 8096
    bulk_insert_buffer_size = 64M
    myisam_sort_buffer_size = 128M
    myisam_max_sort_file_size = 10G
    myisam_repair_threads = 1

    ###LOG SETTINGS
    log_error = /data/mysql_3307/logs/node004026.err
    general_log_file = /data/mysql_3307/logs/node004026.log
    slow_query_log = 1
    slow_query_log_file = /data/mysql_3307/logs/slow-queries.log
    long_query_time = 0.2
    log_queries_not_using_indexes = 1
    log_slow_admin_statements = 1
    log_slow_slave_statements = 1
    log_throttle_queries_not_using_indexes = 60
    min_examined_row_limit = 100
    expire_logs_days = 3
    #log_timestamps = system

    ###INNODB SETTINGS
    innodb_buffer_pool_size = 50G
    innodb_buffer_pool_instances = 8
    innodb_buffer_pool_load_at_startup = 1
    innodb_buffer_pool_dump_at_shutdown = 1
    innodb_lru_scan_depth = 4096
    innodb_lock_wait_timeout = 20
    innodb_io_capacity = 2000
    innodb_io_capacity_max = 4000
    innodb_flush_method = O_DIRECT
    #innodb_undo_logs = 128
    #innodb_undo_tablespaces = 3
    #innodb_max_undo_log_size = 4G
    #innodb_undo_log_truncate = 1
    innodb_flush_neighbors = 0
    innodb_log_file_size = 1G
    innodb_log_files_in_group = 5
    innodb_log_buffer_size = 32M
    innodb_large_prefix = 1
    innodb_thread_concurrency = 64
    innodb_print_all_deadlocks = 1
    innodb_strict_mode = 1
    innodb_purge_threads = 4
    innodb_write_io_threads = 8
    innodb_read_io_threads = 8
    innodb_page_cleaners = 8
    innodb_sort_buffer_size = 32M
    innodb_file_per_table = 1
    innodb_stats_persistent_sample_pages = 64
    innodb_autoinc_lock_mode = 2
    innodb_online_alter_log_max_size = 4G
    innodb_open_files = 65535
    #innodb_data_file_path = ibdata1:1G:autoextend
    innodb_flush_log_at_trx_commit = 1
    innodb_checksums = 1
    innodb_checksum_algorithm = crc32
    innodb_rollback_on_timeout = 1
    internal_tmp_disk_storage_engine = InnoDB
    innodb_status_file = 1
    innodb_status_output_locks = 1

    ###REPLICATION SETTINGS
    sync_binlog = 1
    master_info_repository = TABLE
    relay_log_info_repository = TABLE
    gtid_mode = on
    enforce_gtid_consistency = 1
    log_slave_updates
    binlog_format = row
    binlog_row_image = full
    binlog_cache_size = 4M
    binlog_gtid_simple_recovery = 1
    max_binlog_cache_size = 2G
    max_binlog_size = 1G
    relay_log_recovery = 1
    relay_log = /data/mysql_3307/logs/0040263307-relay-bin
    relay_log_purge = 1
    log_bin = /data/mysql_3307/logs/0040263307-mysql-bin
    #binlog_transaction_dependency_history_size = 25000
    slave_parallel_type = LOGICAL_CLOCK
    slave_parallel_workers = 8
    slave_preserve_commit_order = on

    ###PERFORMANCE_SCHEMA SETTINGS
    performance_schema = 1
    performance_schema_instrument = ‘%=on’
    performance_schema_digests_size = 40000
    performance_schema_max_table_instances = 40000
    performance_schema_max_sql_text_length = 4096
    performance_schema_max_digest_length = 4096
    performance-schema-instrument=’stage/%=ON’
    performance-schema-consumer-events-stages-current=ON
    performance-schema-consumer-events-stages-history=ON
    performance-schema-consumer-events-stages-history-long=ON
    performance-schema-consumer-events-transactions-history-long=ON

    ###INNODB MONITOR SETTINGS
    innodb_monitor_enable=module_innodb,module_dml,module_ddl,module_trx,module_os,module_purge,module_log,module_lock,module_buffer,module_index,module_ibuf_system,module_buffer_page,module_adaptive_hash

    ###ADDITIONAL SETTINGS
    # BEGIN ANSIBLE MANAGED BLOCK FOR MGR
    ###MGR SETTINGS
    read_only=1
    super_read_only=1
    binlog_transaction_dependency_tracking = WRITESET
    transaction-write-set-extraction = XXHASH64
    binlog_transaction_dependency_history_size = 25000
    # report_host = 127.0.0.1 # optional for group replication
    binlog_checksum = NONE # only for group replication
    loose-group_replication_group_name = ’38f8425e-9182-5934-b32a-7e4317fe4a04′
    loose-group_replication_start_on_boot = off
    loose-group_replication_local_address = 192.168.4.26:23307
    loose-group_replication_group_seeds = “192.168.4.26:23307,192.168.4.31:23307,192.168.4.41:23307”
    loose-group_replication_bootstrap_group = off
    loose_group_replication_single_primary_mode = 0
    loose_group_replication_enforce_update_everywhere_checks = 1
    loose_group_replication_unreachable_majority_timeout = 120
    loose_group_replication_start_on_boot = 0
    # END ANSIBLE MANAGED BLOCK FOR MGR

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.