部署MySQL InnoDB集群-蒲公英云

MySQL InnoDB Cluster简介

MySQL InnoDB Cluster 是最新GA的MySQL高可用方案，利用MySQL Group Replication和MySQL Shell、MySQL Router可以轻松搭建强壮的高可用方案。

MySQL Shell 是新的mysql 客户端工具支持x protocol和mysql protocol，具备JavaScript和python可编程能力，作为搭建InnoDB Cluster管理工具。

MySQL Router 是访问路由转发中间件，提供应用程序访问的failover能力。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CwKDEo6u-1597371597755)(https://jeremyxu2010.github.io/images/20190526/innodb\_cluster\_overview.png)\]

上面这张图看着比较清楚，通过MySQL Shell可以配置出一个高可用自动进行故障转移的MySQL InnoDB Cluster，在后续运维过程中也可以通过MySQL Shell对集群进行状态监控及管理维护。通过MySQL Router向应用层屏蔽底层集群的细节，以应用层将普通的MySQL协议访问集群。

MySQL Group Replication 是最新GA的同步复制方式，具有以下特点：

支持单主和多主模式
基于Paxos算法，实现数据复制的一致性
插件化设计，支持插件检测，新增节点小于集群当前节点主版本号，拒绝加入集群，大于则加入，但无法作为主节点
没有第三方组件依赖
支持全链路SSL通讯
支持IP白名单
不依赖网络多播

搭建MySQL InnoDB Cluster

这里准备了3台虚拟机mysql0、mysql1、mysql2，IP分别为192.168.7.30、192.168.7.31、192.168.7.32。

安装软件包

第一步是在三台虚拟机上均安装mysql-community-server、mysql-shell、mysql-router软件包。

# 配置mysql的yum源
$ yum install -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
# 安装
$ yum install -y mysql-community-server mysql-shell mysql-router

配置主机映射

为保证三台虚拟机上可正常通过名称解析到对方的IP，配置DNS，或将主机名称映射写入hosts文件中。

$ cat << EOF >> /etc/hosts
192.168.7.30 mysql0
192.168.7.31 mysql1
192.168.7.32 mysql2
EOF

修改root密码

为了后续操作方便，这里修改三台虚拟机上MySQL的root密码。

# 首先得到初始的root密码
$ systemctl start mysqld
$ ORIGINAL_ROOT_PASSWORD=$(awk  '/temporary password/{print $NF}' /var/log/mysqld.log)
# 这里将mysql的root密码修改为R00T@mysql，这个密码符合复杂度要求
$ MYSQL_PWD="$ORIGINAL_ROOT_PASSWORD" mysql --connect-expired-password -e "ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
# 顺便允许mysql可在其它主机登录过来
$ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "CREATE USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
$ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "GRANT ALL ON *.* TO 'root'@'%' WITH GRANT OPTION; FLUSH PRIVILEGES;"

配置本地实例

MySQL InnoDB Cluster底层依赖Group Replication模式，而配置Group Replication模式首先要通过dba.configureLocalInstance设置每台虚拟机上的本地实例必要参数并持久化配置。

# 通过mysqlsh即可轻松完成本机实例的配置
$ cat << EOF > config_local_instance.js
dba.configureLocalInstance('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false, 'mycnfPath': '/etc/my.cnf'})
EOF
$ mysqlsh --no-password --js --file=config_local_instance.js
# 重启后才能生效
$ systemctl restart mysqld
# 再检查一下本地实例配置的状况
$ cat << EOF > check_instance_configuration.js
dba.checkInstanceConfiguration('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false})
EOF
$ mysqlsh --no-password --js --file=check_instance_configuration.js

初始化MySQL InnoDB Cluster

首先需要在所有节点的防火墙开通接口3306和33061（用于建立Cluster）。

firewall-cmd --zone=public --add-port=3306/tcp --permanent
firewall-cmd --zone=public --add-port=33061/tcp --permanent
firewall-cmd --reload
# 查看是否添加成功
firewall-cmd --zone=public --list-port

然后在mysql0这台虚拟机上进行以下操作就可以了。

$ cat << EOF > init_cluster.js
shell.connect('root@localhost:3306', 'R00T@mysql')
dba.createCluster('mycluster', {'localAddress': 'mysql0'})
var cluster=dba.getCluster('mycluster')
cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})
cluster.addInstance('root@mysql2:3306', {'localAddress': 'mysql2', 'password': 'R00T@mysql'})
EOF
$ mysqlsh --no-password --js --file=init_cluster.js

如果在mysqlsh中查看集群状态，会类似如下输出：

$ mysqlsh
> shell.connect('root@localhost:3306', 'R00T@mysql') 
> var cluster=dba.getCluster('mycluster')
> cluster.status()
{
    "clusterName": "mycluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql0:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysql1:3306": {
                "address": "mysql1:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysql2:3306": {
                "address": "mysql2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql0:3306"
}

可见是一个读写（R/W），其余为只读（R/O）。

集群故障处理

模拟故障一：`mysql0`节点故障

在mysql0节点停止mysqld服务：systemctl stop mysqld。

在mysql1节点上查看集群状态：

...
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
                "status": "(MISSING)"
            },
            "mysql1:3306": {
                "address": "mysql1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            ...

此时mysql被置为可读写的主节点，mysql2为只读节点。

下面启动mysql0的mysqld服务：systemctl start mysqld。

此时对于故障恢复的节点需要执行rejoinInstance操作：

$ mysqlsh
> shell.connect('root@localhost:3306', 'R00T@mysql') 
> var cluster=dba.getCluster('mycluster')
> cluster.status()
...
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
...
> cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
> cluster.status()
...
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysql1:3306": {
                "address": "mysql1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
...

可见mysql0作为从节点加入了。

模拟故障二：所有节点宕机

当集群的所有节点都offline，直接获取集群信息失败，如何重新恢复集群。

> var cluster=dba.getCluster('mycluster')
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (RuntimeError)

执行rebootClusterFromCompleteOutage命令，可恢复集群（下面是在mysql2上执行的）。

> dba.rebootClusterFromCompleteOutage('mycluster')
Restoring the cluster 'mycluster' from complete outage...
The instance 'mysql0:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The instance 'mysql1:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
NOTE: Group Replication will communicate with other members using 'mysql2:33061'. Use the localAddress option to override.
...

模拟故障三：节点个数不足Quorum

当集群中有部分节点出现UNREACHABLE状态，此时集群无法做出决策，会出现以下局面，此时只剩下一个活跃节点，此节点只能提供查询，无法写入，执行写入操作会hang住。

{
    "clusterName": "mycluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql2:3306",
        "ssl": "DISABLED",
        "status": "NO_QUORUM",
        "statusText": "Cluster has no quorum as visible from 'mysql2:3306' and cannot process write transactions. 2 members are not active",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
                "status": "(MISSING)"
            },
            "mysql1:3306": {
                "address": "mysql1:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (110)",
                "status": "UNREACHABLE"
            },
            "mysql2:3306": {
                "address": "mysql2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        },
        ...

修复这种状态，需要执行forceQuorumUsingPartitionOf指定当前活跃节点(如果是多个则选择primary node)，此时活跃节点可以提供读写操作，然后将其他节点加入此集群。

> cluster.forceQuorumUsingPartitionOf('root@mysql2:3306')
Restoring cluster 'mycluster' from loss of quorum, by using the partition composed of [mysql2:3306]
Restoring the InnoDB cluster ...
Please provide the password for 'root@mysql2:3306': **********
Save password for 'root@mysql2:3306'? [Y]es/[N]o/Ne[v]er (default No): y
The InnoDB cluster was successfully restored using the partition from the instance 'root@mysql2:3306'.
WARNING: To avoid a split-brain scenario, ensure that all other members of the cluster are removed or joined back to the group that was restored.
> cluster.status()
...
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
        "topology": {
            "mysql0:3306": {
                "address": "mysql0:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
                "status": "(MISSING)"
            },
            "mysql1:3306": {
                "address": "mysql1:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (113)",
                "status": "(MISSING)"
            },
            "mysql2:3306": {
                "address": "mysql2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
...

可以看到节点状态从UNREACHABLE变成了(MISSING)，集群状态从NO_QUORUM变为OK_NO_TOLERANCE。此时就可以进行写操作了。

故障修复后把节点加进来：

> cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
> cluster.rejoinInstance('root@mysql1:3306', {'password': 'R00T@mysql'})
# 这里最初UNREACHABLE的节点在加入回来会报错，似乎是一个BUG，可以先尝试删除再加入
> cluster.removeInstance('root@mysql1:3306', {'password': 'R00T@mysql', 'force': true})
> cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})

MySQL Router

初始化mysql-router

首先需要在所有节点的防火墙开通接口6446和6447（用于建立Cluster）。

firewall-cmd --zone=public --add-port=6446/tcp --permanent
firewall-cmd --zone=public --add-port=6447/tcp --permanent
firewall-cmd --reload
# 查看是否添加成功
firewall-cmd --zone=public --list-port

为了向应用层屏蔽底层集群的细节，我们还可以在三台虚拟机上均部署mysql-router。

# 以当前集群信息创建mysql-router的配置信息，注意这里密码R00T@mysql被编码为R00T%40mysql
$ mysqlrouter --bootstrap root:R00T%40mysql@mysql2:3306 --user=mysqlrouter
# 重启mysqlrouter服务
$ systemctl enable mysqlrouter
$ systemctl restart mysqlrouter

mysqlrouter --bootstrap的时候会根据提供的URI获取集群元数据，如果该URI是只读的，那会自动找到读写节点。

配置高可用

利用keepalived配置高可用。

安装keepalived：

yum install keepalived
# 备份配置文件
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

编辑配置文件/etc/keepalived/keepalived.conf

# MASTER节点
global_defs {
   router_id MYSQL_ROUTER                    # 各节点统一ID
   vrrp_skip_check_adv_addr
   vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
vrrp_script check_mysqlrouter {
    script "/usr/bin/killall -0 /usr/bin/mysqlrouter"    # 检测mysqlrouter是否在运行
    interval 2
    weight 2
    fall 2
}
vrrp_instance VI_1 {
    state MASTER                    # 主节点
    interface ens192            # VIP绑定的网卡
    virtual_router_id 33    # 各节点统一的虚拟ID
    priority 102                    # 数越高优先级越高
    advert_int 1                    # 检测间隔 1s
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.7.33            # VIP
    }
    track_script {
        check_mysqlrouter    # 检测脚本
    }
}
# BACKUP节点（不同的配置）
    state BACKUP                    # 备节点
    priority 101                    # 数值低于MASTER

配置防火墙：

firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
firewall-cmd --reload

重启keepalived：systemctl restart keepalived，然后ip a查看VIP绑定情况。

MySQL InnoDB Cluster简介

搭建MySQL InnoDB Cluster

安装软件包

配置主机映射

修改root密码

配置本地实例

初始化MySQL InnoDB Cluster

集群故障处理

模拟故障一：mysql0节点故障

模拟故障二：所有节点宕机

模拟故障三：节点个数不足Quorum

MySQL Router

初始化mysql-router

配置高可用

发表评论取消回复

还没有评论，来说两句吧...

相关阅读

随便看看

教程文章

热评文章

1江湖小白之一起学Python （二）爬取数据的保存

2Java Shiro：简化身份验证和授权的安全框架

3Java中try()catch{}的使用方法

4Swagger注解-@ApiModel 和 @ApiModelProperty

5windows下强制杀死tomcat进程

6uni-app 条形码(一维码)/二维码生成实现

标签列表

模拟故障一：`mysql0`节点故障