部署MySQL InnoDB集群

快来打我* 2022-11-27 07:11 430阅读 0赞

MySQL InnoDB Cluster简介

MySQL InnoDB Cluster 是最新GA的MySQL高可用方案,利用MySQL Group Replication和MySQL Shell、MySQL Router可以轻松搭建强壮的高可用方案。

MySQL Shell 是新的mysql 客户端工具支持x protocol和mysql protocol,具备JavaScript和python可编程能力,作为搭建InnoDB Cluster管理工具。

MySQL Router 是访问路由转发中间件,提供应用程序访问的failover能力。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CwKDEo6u-1597371597755)(https://jeremyxu2010.github.io/images/20190526/innodb\_cluster\_overview.png)\]

上面这张图看着比较清楚,通过MySQL Shell可以配置出一个高可用自动进行故障转移的MySQL InnoDB Cluster,在后续运维过程中也可以通过MySQL Shell对集群进行状态监控及管理维护。通过MySQL Router向应用层屏蔽底层集群的细节,以应用层将普通的MySQL协议访问集群。

MySQL Group Replication 是最新GA的同步复制方式,具有以下特点:

  • 支持单主和多主模式
  • 基于Paxos算法,实现数据复制的一致性
  • 插件化设计,支持插件检测,新增节点小于集群当前节点主版本号,拒绝加入集群,大于则加入,但无法作为主节点
  • 没有第三方组件依赖
  • 支持全链路SSL通讯
  • 支持IP白名单
  • 不依赖网络多播

搭建MySQL InnoDB Cluster

这里准备了3台虚拟机mysql0mysql1mysql2,IP分别为192.168.7.30192.168.7.31192.168.7.32

安装软件包

第一步是在三台虚拟机上均安装mysql-community-server、mysql-shell、mysql-router软件包。

  1. # 配置mysql的yum源
  2. $ yum install -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
  3. # 安装
  4. $ yum install -y mysql-community-server mysql-shell mysql-router

配置主机映射

为保证三台虚拟机上可正常通过名称解析到对方的IP,配置DNS,或将主机名称映射写入hosts文件中。

  1. $ cat << EOF >> /etc/hosts
  2. 192.168.7.30 mysql0
  3. 192.168.7.31 mysql1
  4. 192.168.7.32 mysql2
  5. EOF

修改root密码

为了后续操作方便,这里修改三台虚拟机上MySQL的root密码。

  1. # 首先得到初始的root密码
  2. $ systemctl start mysqld
  3. $ ORIGINAL_ROOT_PASSWORD=$(awk '/temporary password/{print $NF}' /var/log/mysqld.log)
  4. # 这里将mysql的root密码修改为R00T@mysql,这个密码符合复杂度要求
  5. $ MYSQL_PWD="$ORIGINAL_ROOT_PASSWORD" mysql --connect-expired-password -e "ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
  6. # 顺便允许mysql可在其它主机登录过来
  7. $ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "CREATE USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
  8. $ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "GRANT ALL ON *.* TO 'root'@'%' WITH GRANT OPTION; FLUSH PRIVILEGES;"

配置本地实例

MySQL InnoDB Cluster底层依赖Group Replication模式,而配置Group Replication模式首先要通过dba.configureLocalInstance设置每台虚拟机上的本地实例必要参数并持久化配置。

  1. # 通过mysqlsh即可轻松完成本机实例的配置
  2. $ cat << EOF > config_local_instance.js
  3. dba.configureLocalInstance('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false, 'mycnfPath': '/etc/my.cnf'})
  4. EOF
  5. $ mysqlsh --no-password --js --file=config_local_instance.js
  6. # 重启后才能生效
  7. $ systemctl restart mysqld
  8. # 再检查一下本地实例配置的状况
  9. $ cat << EOF > check_instance_configuration.js
  10. dba.checkInstanceConfiguration('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false})
  11. EOF
  12. $ mysqlsh --no-password --js --file=check_instance_configuration.js

初始化MySQL InnoDB Cluster

首先需要在所有节点的防火墙开通接口330633061(用于建立Cluster)。

  1. firewall-cmd --zone=public --add-port=3306/tcp --permanent
  2. firewall-cmd --zone=public --add-port=33061/tcp --permanent
  3. firewall-cmd --reload
  4. # 查看是否添加成功
  5. firewall-cmd --zone=public --list-port

然后在mysql0这台虚拟机上进行以下操作就可以了。

  1. $ cat << EOF > init_cluster.js
  2. shell.connect('root@localhost:3306', 'R00T@mysql')
  3. dba.createCluster('mycluster', {'localAddress': 'mysql0'})
  4. var cluster=dba.getCluster('mycluster')
  5. cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})
  6. cluster.addInstance('root@mysql2:3306', {'localAddress': 'mysql2', 'password': 'R00T@mysql'})
  7. EOF
  8. $ mysqlsh --no-password --js --file=init_cluster.js

如果在mysqlsh中查看集群状态,会类似如下输出:

  1. $ mysqlsh
  2. > shell.connect('root@localhost:3306', 'R00T@mysql')
  3. > var cluster=dba.getCluster('mycluster')
  4. > cluster.status()
  5. {
  6. "clusterName": "mycluster",
  7. "defaultReplicaSet": {
  8. "name": "default",
  9. "primary": "mysql0:3306",
  10. "ssl": "REQUIRED",
  11. "status": "OK",
  12. "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
  13. "topology": {
  14. "mysql0:3306": {
  15. "address": "mysql0:3306",
  16. "mode": "R/W",
  17. "readReplicas": {},
  18. "role": "HA",
  19. "status": "ONLINE"
  20. },
  21. "mysql1:3306": {
  22. "address": "mysql1:3306",
  23. "mode": "R/O",
  24. "readReplicas": {},
  25. "role": "HA",
  26. "status": "ONLINE"
  27. },
  28. "mysql2:3306": {
  29. "address": "mysql2:3306",
  30. "mode": "R/O",
  31. "readReplicas": {},
  32. "role": "HA",
  33. "status": "ONLINE"
  34. }
  35. },
  36. "topologyMode": "Single-Primary"
  37. },
  38. "groupInformationSourceMember": "mysql0:3306"
  39. }

可见是一个读写(R/W),其余为只读(R/O)。

集群故障处理

模拟故障一:mysql0节点故障

mysql0节点停止mysqld服务:systemctl stop mysqld

mysql1节点上查看集群状态:

  1. ...
  2. "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
  3. "topology": {
  4. "mysql0:3306": {
  5. "address": "mysql0:3306",
  6. "mode": "n/a",
  7. "readReplicas": {},
  8. "role": "HA",
  9. "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
  10. "status": "(MISSING)"
  11. },
  12. "mysql1:3306": {
  13. "address": "mysql1:3306",
  14. "mode": "R/W",
  15. "readReplicas": {},
  16. "role": "HA",
  17. "status": "ONLINE"
  18. },
  19. ...

此时mysql被置为可读写的主节点,mysql2为只读节点。

下面启动mysql0mysqld服务:systemctl start mysqld

此时对于故障恢复的节点需要执行rejoinInstance操作:

  1. $ mysqlsh
  2. > shell.connect('root@localhost:3306', 'R00T@mysql')
  3. > var cluster=dba.getCluster('mycluster')
  4. > cluster.status()
  5. ...
  6. "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
  7. "topology": {
  8. "mysql0:3306": {
  9. "address": "mysql0:3306",
  10. "mode": "R/W",
  11. "readReplicas": {},
  12. "role": "HA",
  13. "status": "(MISSING)"
  14. },
  15. ...
  16. > cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
  17. > cluster.status()
  18. ...
  19. "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
  20. "topology": {
  21. "mysql0:3306": {
  22. "address": "mysql0:3306",
  23. "mode": "R/O",
  24. "readReplicas": {},
  25. "role": "HA",
  26. "status": "ONLINE"
  27. },
  28. "mysql1:3306": {
  29. "address": "mysql1:3306",
  30. "mode": "R/W",
  31. "readReplicas": {},
  32. "role": "HA",
  33. "status": "ONLINE"
  34. },
  35. ...

可见mysql0作为从节点加入了。

模拟故障二:所有节点宕机

当集群的所有节点都offline,直接获取集群信息失败,如何重新恢复集群。

  1. > var cluster=dba.getCluster('mycluster')
  2. Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (RuntimeError)

执行rebootClusterFromCompleteOutage命令,可恢复集群(下面是在mysql2上执行的)。

  1. > dba.rebootClusterFromCompleteOutage('mycluster')
  2. Restoring the cluster 'mycluster' from complete outage...
  3. The instance 'mysql0:3306' was part of the cluster configuration.
  4. Would you like to rejoin it to the cluster? [y/N]: y
  5. The instance 'mysql1:3306' was part of the cluster configuration.
  6. Would you like to rejoin it to the cluster? [y/N]: y
  7. NOTE: Group Replication will communicate with other members using 'mysql2:33061'. Use the localAddress option to override.
  8. ...

模拟故障三:节点个数不足Quorum

当集群中有部分节点出现UNREACHABLE状态,此时集群无法做出决策,会出现以下局面,此时只剩下一个活跃节点,此节点只能提供查询,无法写入,执行写入操作会hang住。

  1. {
  2. "clusterName": "mycluster",
  3. "defaultReplicaSet": {
  4. "name": "default",
  5. "primary": "mysql2:3306",
  6. "ssl": "DISABLED",
  7. "status": "NO_QUORUM",
  8. "statusText": "Cluster has no quorum as visible from 'mysql2:3306' and cannot process write transactions. 2 members are not active",
  9. "topology": {
  10. "mysql0:3306": {
  11. "address": "mysql0:3306",
  12. "mode": "n/a",
  13. "readReplicas": {},
  14. "role": "HA",
  15. "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
  16. "status": "(MISSING)"
  17. },
  18. "mysql1:3306": {
  19. "address": "mysql1:3306",
  20. "mode": "n/a",
  21. "readReplicas": {},
  22. "role": "HA",
  23. "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (110)",
  24. "status": "UNREACHABLE"
  25. },
  26. "mysql2:3306": {
  27. "address": "mysql2:3306",
  28. "mode": "R/O",
  29. "readReplicas": {},
  30. "role": "HA",
  31. "status": "ONLINE"
  32. }
  33. },
  34. ...

修复这种状态,需要执行forceQuorumUsingPartitionOf指定当前活跃节点(如果是多个则选择primary node),此时活跃节点可以提供读写操作,然后将其他节点加入此集群。

  1. > cluster.forceQuorumUsingPartitionOf('root@mysql2:3306')
  2. Restoring cluster 'mycluster' from loss of quorum, by using the partition composed of [mysql2:3306]
  3. Restoring the InnoDB cluster ...
  4. Please provide the password for 'root@mysql2:3306': **********
  5. Save password for 'root@mysql2:3306'? [Y]es/[N]o/Ne[v]er (default No): y
  6. The InnoDB cluster was successfully restored using the partition from the instance 'root@mysql2:3306'.
  7. WARNING: To avoid a split-brain scenario, ensure that all other members of the cluster are removed or joined back to the group that was restored.
  8. > cluster.status()
  9. ...
  10. "status": "OK_NO_TOLERANCE",
  11. "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
  12. "topology": {
  13. "mysql0:3306": {
  14. "address": "mysql0:3306",
  15. "mode": "n/a",
  16. "readReplicas": {},
  17. "role": "HA",
  18. "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
  19. "status": "(MISSING)"
  20. },
  21. "mysql1:3306": {
  22. "address": "mysql1:3306",
  23. "mode": "n/a",
  24. "readReplicas": {},
  25. "role": "HA",
  26. "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (113)",
  27. "status": "(MISSING)"
  28. },
  29. "mysql2:3306": {
  30. "address": "mysql2:3306",
  31. "mode": "R/W",
  32. "readReplicas": {},
  33. "role": "HA",
  34. "status": "ONLINE"
  35. }
  36. ...

可以看到节点状态从UNREACHABLE变成了(MISSING),集群状态从NO_QUORUM变为OK_NO_TOLERANCE。此时就可以进行写操作了。

故障修复后把节点加进来:

  1. > cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
  2. > cluster.rejoinInstance('root@mysql1:3306', {'password': 'R00T@mysql'})
  3. # 这里最初UNREACHABLE的节点在加入回来会报错,似乎是一个BUG,可以先尝试删除再加入
  4. > cluster.removeInstance('root@mysql1:3306', {'password': 'R00T@mysql', 'force': true})
  5. > cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})

MySQL Router

初始化mysql-router

首先需要在所有节点的防火墙开通接口64466447(用于建立Cluster)。

  1. firewall-cmd --zone=public --add-port=6446/tcp --permanent
  2. firewall-cmd --zone=public --add-port=6447/tcp --permanent
  3. firewall-cmd --reload
  4. # 查看是否添加成功
  5. firewall-cmd --zone=public --list-port

为了向应用层屏蔽底层集群的细节,我们还可以在三台虚拟机上均部署mysql-router。

  1. # 以当前集群信息创建mysql-router的配置信息,注意这里密码R00T@mysql被编码为R00T%40mysql
  2. $ mysqlrouter --bootstrap root:R00T%40mysql@mysql2:3306 --user=mysqlrouter
  3. # 重启mysqlrouter服务
  4. $ systemctl enable mysqlrouter
  5. $ systemctl restart mysqlrouter

mysqlrouter --bootstrap的时候会根据提供的URI获取集群元数据,如果该URI是只读的,那会自动找到读写节点。

配置高可用

利用keepalived配置高可用。

安装keepalived:

  1. yum install keepalived
  2. # 备份配置文件
  3. cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

编辑配置文件/etc/keepalived/keepalived.conf

  1. # MASTER节点
  2. global_defs {
  3. router_id MYSQL_ROUTER # 各节点统一ID
  4. vrrp_skip_check_adv_addr
  5. vrrp_strict
  6. vrrp_garp_interval 0
  7. vrrp_gna_interval 0
  8. }
  9. vrrp_script check_mysqlrouter {
  10. script "/usr/bin/killall -0 /usr/bin/mysqlrouter" # 检测mysqlrouter是否在运行
  11. interval 2
  12. weight 2
  13. fall 2
  14. }
  15. vrrp_instance VI_1 {
  16. state MASTER # 主节点
  17. interface ens192 # VIP绑定的网卡
  18. virtual_router_id 33 # 各节点统一的虚拟ID
  19. priority 102 # 数越高优先级越高
  20. advert_int 1 # 检测间隔 1s
  21. authentication {
  22. auth_type PASS
  23. auth_pass 1111
  24. }
  25. virtual_ipaddress {
  26. 192.168.7.33 # VIP
  27. }
  28. track_script {
  29. check_mysqlrouter # 检测脚本
  30. }
  31. }
  32. # BACKUP节点(不同的配置)
  33. state BACKUP # 备节点
  34. priority 101 # 数值低于MASTER

配置防火墙:

  1. firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
  2. firewall-cmd --reload

重启keepalived:systemctl restart keepalived,然后ip a查看VIP绑定情况。

发表评论

表情:
评论列表 (有 0 条评论,430人围观)

还没有评论,来说两句吧...

相关阅读

    相关 Docker 部署 MySQL 服务

    一、什么是集群 计算机集群(computer cluster)是一组松散或紧密连接在一起工作的计算机。由于这些计算机协同工作,在许多方面它们可以被视为单个系统。与网格计算