MySQL MHA 搭建&测试 不指定

jackxiang 2016-6-22 15:02 | |
引子:目前做Mysql的中间件很多,做网站缺了东西就缺了,但不缺太多就成,所以易地备份还是有它的意义,Mysql这块MHA能在心理上有一定安慰,而真是要真金白银来讲,这种做法也是很难以让人信服,听群里兄弟提起关于银行这块的一些实践基本上是不敢切的,MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上。MHA Manager探测集群的node节点,当发现master出现故障的时它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上。整个故障转移过程对应用程序是透明的。对于银行来说最新数据的slave, 也可能不完整呀,容灾只是为了备份,所以业务数据完整性重要的话,异地热切换这种事情想想就好了,工行当年升级db2所有业务挂了一整天都没敢切异地,他们号称技术上实现了异地热切换的国内四大行都有这个最后真出问题谁都不敢切。我问:比如uber这样的,出了问题,岂不是一天不打车了?宇宙行一整天没办业务,uber算啥,异地容灾.除非强数据同步.不然都可能会丢失数据.MHA这东西成本很高的,对于我们来说,做好备份就行了,冷备,热备,主从复制。rsync,mysqldump备份,物理文件备份,主从复制。差不多就行了,rsync我主要用于备份git,而mysqldump啥的主要用来备份mysql到不同机器。


一:背景介绍

      MHA(Master HA)是一款开源的MySQL的高可用工具,能在MySQL主从复制的基础上,实现自动化主服务器故障转移。虽然MHA试图从宕机的主服务器上保存二进制日志,但并不是总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失最新数据。

      MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。还可以设置优先级指定其中的一台slave作为master的候选,由于MHA在slaves之间修复一致性,因此可以将slave变成新的master,其他的slave都以其作为新master。

MHA有两部分组成,MHA Manager(管理节点)和MHA Node(数据节点):
1:MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上。MHA Manager探测集群的node节点,当发现master出现故障的时它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上。整个故障转移过程对应用程序是透明的。
2:MHA node运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移的。

复制代码
优点:
1 找出同步最成功的一台从服务器(也就是与主服务器数据最接近的那台从服务器)自动切换成主服务器。
2 如果主机还能够访问,从主服务器上找回最新从机与主机间的数据差异。
3 在每一台从服务器上操作,确定他们缺少哪些events,并分别进行补充。
4 将最新的一台从服务器提升为主服务器后,将其它从服务器重新指向新的主服务器。

缺点:
1 当群集内的数据库进行故障转移时,对外提供服务的虚拟IP也进行转移。
2 MHA管理进程需要以后台守护进程的方式运行,并有监控机制保证MHA管理进程的正常运行。
3 有监控机制保证当主机出现故障时,MHA能确定进行成功的Failover。
4 当故障主机恢复后,能重新回到群集中,并成为新的Slave,自动实现重新同步。
5 由于主机和从机上备份策略不同,进行故障转移后,自动调整cron中的调度(例如全备份)。
复制代码
二:下载

https://code.google.com/p/mysql-master-ha/downloads/list?q=label:Featured
https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2   #(5.6)
wget https://mysql-master-ha.googlecode.com/files/mha4mysql-manager-0.55.tar.gz
wget https://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.54.tar.gz
版本说明:https://code.google.com/p/mysql-master-ha/wiki/ReleaseNotes

三:安装

https://code.google.com/p/mysql-master-ha/wiki/Installation#Installing_MHA_Manager
安装完成之后会产生一些相关工具:

复制代码
Manager工具:
- masterha_check_ssh : 检查MHA的SSH配置。
- masterha_check_repl : 检查MySQL复制。
- masterha_manager : 启动MHA。
- masterha_check_status : 检测当前MHA运行状态。
- masterha_master_monitor : 监测master是否宕机。
- masterha_master_switch : 控制故障转移(自动或手动)。
- masterha_conf_host : 添加或删除配置的server信息。

Node工具:
- save_binary_logs : 保存和复制master的二进制日志。
- apply_diff_relay_logs : 识别差异的中继日志事件并应用于其它slave。
- filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)。
- purge_relay_logs : 清除中继日志(不会阻塞SQL线程)。
复制代码
【本文是通过源码安装】

1,MHA Manager 安装

复制代码
  ## Install dependent Perl modules
  # Config::Tiny
  ## perl -MCPAN -e "install Config::Tiny"
  # Log::Dispatch
  ## perl -MCPAN -e "install Log::Dispatch"
  # Parallel::ForkManager
  ## perl -MCPAN -e "install Parallel::ForkManager"
  ## Installing MHA Manager
  $ tar -zxf mha4mysql-manager-X.Y.tar.gz
  $ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.616)
- DBD::mysql            ...loaded. (4.020)
- Time::HiRes           ...loaded. (1.972101)
- Config::Tiny          ...loaded. (2.20)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.06)
- MHA::NodeConst        ...loaded. (0.54)  先装Node,这里才能通过。
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json
  $ make
  $ sudo make install
复制代码
2,MHA Node 安装

复制代码
## Install DBD::mysql if not installed
  $ tar -zxf mha4mysql-node-X.Y.tar.gz
  $ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.616)
- DBD::mysql ...loaded. (4.020)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for mha4mysql::node
Writing MYMETA.yml and MYMETA.json
  $ make
  $ sudo make install
复制代码
上面红色标明的不能出现miss,否则后面运行就会有问题。在安装时候要设置好locale,否则安装也会有问题,可以先运行:

export LC_ALL="zh_CN.UTF-8"
在安装之前需要安装一些依赖的包,如:gcc、iputils-arping等。到此,MHA安装结束了。

三:测试

必要条件:

1)三台主机:201(M/node)、202(S/node)、204(S/manager)。

2)MHA按照上面的方法在三台上已经安装完毕,MySQL的Replication已经搭建完成,复制帐号三台都已经创建。

MySQL上需要设置:set global relay_log_purge=0,或则配置文件里修改。

--relay-log-purge={0|1}

禁用或启用不再需要中继日志时是否自动清空它们。默认值为1(启用)。这是一个全局变量,可以用SET GLOBAL Relay_log_purge动态更改。
3)MHA环境需要三台主机需要相互信任:实现三台主机之间相互免密钥登录。

复制代码
[204]
用ssh-keygen创建公钥,一直默认回车,最后会在.ssh/下面生成id_rsa.pub
ssh-keygen -t rsa  

把id_rsa.pub 文件复制到另外2台机子并导入公钥:
cat id_rsa.pub >> /root/.ssh/authorized_keys

当然也可以直接复制里面的内容再粘帖到authorized_keys中。 这样204可以直接访问到其他2台机子了
ssh 192.168.200.201
ssh 192.168.200.202

同理在另外2台机子上面操作,让他们之间实现无密码登入。

要是root登陆的话,需要修改ssh的配置文件:
  vi /etc/ssh/sshd_config  中的 PermitRootLogin ,设置为YES,再重启SSH。

注意:Manager 要是装到某一台MySQL上,则需要自己和自己无密码登入:ssh 192.168.200.204,单独到一台服务器则不需要。

复制代码
4)MHA的配置

Manager上的配置

mkdir -p /usr/local/masterha/app1        #创建数据文件目录
mkdir /etc/masterha                      #创建配置文件目录

安装时解压出来的samples中有2个文件:
conf 是存放配置文件的模块
scripts是存放运行时的一些命令,可以把这几个名字移到/user/bin 中。用户下面灰色背景的几个参数中。
添加MHA的配置文件(Manager)

样本:(51-M,52-S,53-S,VIP:50)

View Code
复制代码
修改后的配置文件:[vi /etc/masterha/app1.cnf]
[server default]
user=root                                 #mysql用戶名,
#需要的权限:Super,select,create,insert,update,delete,drop,reload
password=111111                           #mysql密码
ssh_user=root                    #ssh免密钥登录的帐号名
repl_user=rep                    #mysql复制帐号
repl_password=123456             #mysql复制密码
ping_interval=1                  #ping间隔,用来检测master是否正常,默认是3秒,尝试三次没有回应的时候自动进行failover

manager_workdir=/usr/local/masterha/app1       #数据目录,主要该目录的权限,需要有创建的权限
manager_log=/usr/local/masterha/manager.log    #日志文件
remote_workdir=/usr/local/masterha/app1        #另外2台机子在运行时候需要创建的目录,注意ssh-keygen帐号的权限问题

#master_ip_failover_script=/usr/bin/master_ip_failover
#master_ip_online_change_script=/usr/bin/master_ip_online_change
#shutdown_script=/usr/bin/power_manager
#report_script=/usr/bin/send_report
#secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
[server1]
hostname=192.168.200.201
master_binlog_dir=/var/lib/mysql #binlog目录
candidate_master=1 #master机宕掉后,优先启用这台作为新master
check_repl_delay=0            #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
[server2]
hostname=192.168.200.202
master_binlog_dir=/var/log/mysql
candidate_master=1
[server3] hostname=192.168.200.204
master_binlog_dir=/var/log/mysql
#candidate_master=1
复制代码
进行测试:

◆ masterha_check_ssh工具验证ssh信任登录是否成功

复制代码
root@db4:/usr/local/masterha# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed Jun 25 17:08:40 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 25 17:08:40 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jun 25 17:08:40 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 25 17:08:40 2014 - [info] Starting SSH connection tests..
Wed Jun 25 17:08:41 2014 - [debug]
Wed Jun 25 17:08:40 2014 - [debug]  Connecting via SSH from root@192.168.200.201(192.168.200.201:22) to root@192.168.200.202(192.168.200.202:22)..
Wed Jun 25 17:08:40 2014 - [debug]   ok.
Wed Jun 25 17:08:40 2014 - [debug]  Connecting via SSH from root@192.168.200.201(192.168.200.201:22) to root@192.168.200.204(192.168.200.204:22)..
Wed Jun 25 17:08:40 2014 - [debug]   ok.
Wed Jun 25 17:08:41 2014 - [debug]
Wed Jun 25 17:08:40 2014 - [debug]  Connecting via SSH from root@192.168.200.202(192.168.200.202:22) to root@192.168.200.201(192.168.200.201:22)..
Wed Jun 25 17:08:40 2014 - [debug]   ok.
Wed Jun 25 17:08:40 2014 - [debug]  Connecting via SSH from root@192.168.200.202(192.168.200.202:22) to root@192.168.200.204(192.168.200.204:22)..
Wed Jun 25 17:08:41 2014 - [debug]   ok.
Wed Jun 25 17:08:41 2014 - [debug]
Wed Jun 25 17:08:41 2014 - [debug]  Connecting via SSH from root@192.168.200.204(192.168.200.204:22) to root@192.168.200.201(192.168.200.201:22)..
Wed Jun 25 17:08:41 2014 - [debug]   ok.
Wed Jun 25 17:08:41 2014 - [debug]  Connecting via SSH from root@192.168.200.204(192.168.200.204:22) to root@192.168.200.202(192.168.200.202:22)..
Wed Jun 25 17:08:41 2014 - [debug]   ok.
Wed Jun 25 17:08:41 2014 - [info] All SSH connection tests passed successfully.
复制代码
◆ masterha_check_repl工具验证mysql复制是否成功

复制代码
  root@db4:/usr/local/masterha# masterha_check_repl --conf=/etc/masterha/app1.cnf

...
...
192.168.200.201 (current master)
+--192.168.200.202
+--192.168.200.204

Wed Jun 25 17:27:17 2014 - [info] Checking replication health on 192.168.200.202..
Wed Jun 25 17:27:17 2014 - [info]  ok.
Wed Jun 25 17:27:17 2014 - [info] Checking replication health on 192.168.200.204..
Wed Jun 25 17:27:17 2014 - [info]  ok.
Wed Jun 25 17:27:17 2014 - [warning] master_ip_failover_script is not defined.
Wed Jun 25 17:27:17 2014 - [warning] shutdown_script is not defined.
Wed Jun 25 17:27:17 2014 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
复制代码
◆ masterha_manager 启动MHA manager

复制代码
root@db4:~# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1
查看日志文件(/masterha/app1/manager.log)

    Cleaning up test file(s).. done.
Wed Jun 25 17:36:07 2014 - [info] Slaves settings check done.
Wed Jun 25 17:36:07 2014 - [info]
192.168.200.201 (current master)
+--192.168.200.202
+--192.168.200.204

Wed Jun 25 17:36:07 2014 - [warning] master_ip_failover_script is not defined.
Wed Jun 25 17:36:07 2014 - [warning] shutdown_script is not defined.
Wed Jun 25 17:36:07 2014 - [info] Set master ping interval 1 seconds.
Wed Jun 25 17:36:07 2014 - [info] Set secondary check script: masterha_secondary_check -s remote_host1 -s remote_host2
Wed Jun 25 17:36:07 2014 - [info] Starting ping health check on 192.168.200.201(192.168.200.201:3306)..
Wed Jun 25 17:36:07 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
复制代码
参数:

  --ignore_last_failover:在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件

  --remove_dead_master_conf: 该参数代表当发生主从切换后,老的主库将会从配置文件中移除,一般情况下不需要开启。即:

复制代码
开启程序
nohup masterha_manager --ignore_last_failover --remove_dead_master_conf --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1

+++
故障切换
From:
192.168.200.202 (current master)
+--192.168.200.201
+--192.168.200.204

To:
192.168.200.201 (new master)
+--192.168.200.204
+++

不先处理故障的机子,可直接运行程序,不加该参数会直接报错:
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln443]  Server 192.168.200.201(192.168.200.201:3306) is dead, but must be alive! Check server settings.

nohup masterha_manager --ignore_last_failover --remove_dead_master_conf --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1

可以正常运行,但是配置文件的[server2]选项组被删除
复制代码
◆ 测试master(201)宕机后,是否会自动切换?这里注意的是确保 /usr/local/masterha/app1/app1.failover.complete 没有这类文件存在。否则报:

remove /usr/local/masterha/app1/app1.failover.complete and run this script again,需要加上参数:--ignore_last_failover

nohup masterha_manager --ignore_last_failover --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1
复制代码
...
...
Wed Jun 25 23:45:49 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 25 23:45:50 2014 - [info] Dead Servers:
Wed Jun 25 23:45:50 2014 - [info]   192.168.200.201(192.168.200.201:3306)
Wed Jun 25 23:45:50 2014 - [info] Alive Servers:
Wed Jun 25 23:45:50 2014 - [info]   192.168.200.202(192.168.200.202:3306)
Wed Jun 25 23:45:50 2014 - [info]   192.168.200.204(192.168.200.204:3306)
Wed Jun 25 23:45:50 2014 - [info] Alive Slaves:
Wed Jun 25 23:45:50 2014 - [info]   192.168.200.202(192.168.200.202:3306)  Version=5.5.35-0ubuntu0.12.04.2-log (oldest major version between slaves) log-bin:enabled
Wed Jun 25 23:45:50 2014 - [info]     Replicating from 192.168.200.201(192.168.200.201:3306)
....
....
Wed Jun 25 23:45:50 2014 - [info] Master is down!
Wed Jun 25 23:45:50 2014 - [info] Starting master failover.
....
....
进行5个阶段
Creating /usr/local/masterha/app1 if not exists..    ok.
...
Dumping binlog format description event, from position 0 to 107.. ok.
Wed Jun 25 23:45:51 2014 - [info] scp from root@192.168.200.201:/usr/local/masterha/app1/saved_master_binlog_from_192.168.200.201_3306_20140625234550.binlog to local:/usr/local/masterha/app1/saved_master_binlog_from_192.168.200.201_3306_20140625234550.binlog succeeded.
...
Wed Jun 25 23:45:52 2014 - [info] New master is 192.168.200.202(192.168.200.202:3306)
Wed Jun 25 23:45:52 2014 - [info] Starting master failover..
Wed Jun 25 23:45:52 2014 - [info]
From:
192.168.200.201 (current master)
+--192.168.200.202
+--192.168.200.204

To:
192.168.200.202 (new master)
+--192.168.200.204
...
Applying log files succeeded.
Wed Jun 25 23:45:52 2014 - [info]  All relay logs were successfully applied.
Wed Jun 25 23:45:52 2014 - [info] Getting new master's binlog name and position..
Wed Jun 25 23:45:52 2014 - [info]  mysql-bin.000001:107
Wed Jun 25 23:45:52 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.200.202', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=107, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Wed Jun 25 23:45:52 2014 - [info] ** Finished master recovery successfully.
...
Wed Jun 25 23:45:54 2014 - [info]  Executed CHANGE MASTER.
Wed Jun 25 23:45:54 2014 - [info]  Slave started.
Wed Jun 25 23:45:54 2014 - [info] All new slave servers recovered successfully.
...
----- Failover Report -----

app1: MySQL Master failover 192.168.200.201 to 192.168.200.202 succeeded

Master 192.168.200.201 is down!

Check MHA Manager logs at db4:/usr/local/masterha/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.200.202(192.168.200.202:3306) has all relay logs for recovery.
Selected 192.168.200.202 as a new master.
192.168.200.202: OK: Applying all logs succeeded.
192.168.200.204: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.200.204: OK: Applying all logs succeeded. Slave started, replicating from 192.168.200.202.
192.168.200.202: Resetting slave info succeeded.
Master failover to 192.168.200.202(192.168.200.202:3306) completed successfully.
复制代码
上面只给出了一些比较重要的信息,更多的信息见配置文件指定的日志文件。大致的信息:

主(201)关闭后,查找最新的slave,通过scp把二进制日志应用过去,然后最近的从202执行了reset slave,成了新主,204从202开始同步。

老的主(201)开启之后,需要手动重新同步到新主。通过日志里的:

CHANGE MASTER TO MASTER_HOST='192.168.200.202', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=107, MASTER_USER='rep', MASTER_PASSWORD='xxx';
再重新开启MHA程序,这里需要注意:

一旦发生切换管理进程(Manager)将会退出,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来,并且要保证app1.failover.complete不存在或则加上--ignore_last_failover参数忽略,才能再次开启管理进程。
◆ masterha_master_switch工具进行主从切换,在手动切换的同时需要保证没有启用MHA自动切换功能。

复制代码
root@db4:/usr/local/masterha# masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf
Thu Jun 26 00:19:24 2014 - [info] MHA::MasterRotate version 0.55.
Thu Jun 26 00:19:24 2014 - [info] Starting online master switch..
...
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.200.202(192.168.200.202:3306)? (YES/no): yes
...
From:
192.168.200.202 (current master)
+--192.168.200.201
+--192.168.200.204

To:202消失
192.168.200.201 (new master)
+--192.168.200.204

Starting master switch from 192.168.200.202(192.168.200.202:3306) to 192.168.200.201(192.168.200.201:3306)? (yes/NO): yes
...
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
...
Thu Jun 26 00:19:55 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.200.201', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin-3306.000001', MASTER_LOG_POS=107, MASTER_USER='rep', MASTER_PASSWORD='xxx';
...
Thu Jun 26 00:19:56 2014 - [info] All new slave servers switched successfully.
Thu Jun 26 00:19:56 2014 - [info]  192.168.200.201: Resetting slave info succeeded.
Thu Jun 26 00:19:56 2014 - [info] Switching master to 192.168.200.201(192.168.200.201:3306) completed successfully.
复制代码
加粗部分说明主从切换后的架构变化,也会在命令行提示你是否CHANGE到新主。切换之后老主(202)将会被独立出来。想要把202重新加入到复制中,则只需要执行打印出来的CHANGE MASTER 语句。也可以指定一个IP作为主:

masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf --new_master_host=192.168.200.201
指定201为新主,切换完之后,老主被独立。

◆ masterha_check_status 工具 查看MHA程序的状态:

root@db4:/usr/local/masterha# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
root@db4:/usr/local/masterha# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
root@db4:/usr/local/masterha# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:29232) is running(0:PING_OK), master:192.168.200.201
通过上面的信息可以看到MHA程序是否正在运行。

◆ masterha_stop 工具关闭Manager进程。

root@db4:/usr/local/masterha# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
◆ 配置VIP:master_ip_failover_script=/usr/bin/master_ip_failover。在用虚拟IP的时候,需要在开启MHA程序之前要把虚拟IP先设置到主上去,否则MHA是不会自己的去设置VIP,第一次设置VIP之后,后续脚本的故障转移等功能会自动的对VIP进行切换。

设置:
ifconfig eth0:0 192.168.100.xxx
移除:
ifconfig eth0:0 down
脚本样本:(51-M,52-S,53-S,VIP:50)

复制代码
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.200.50/24';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip  = "/sbin/ifconfig eth0:$key down";
my $ssh_Bcast_arp = "/usr/bin/arping -c 3 -A 192.168.200.50";   #ARP回复模式,更新邻居。要是不加则服务器会自动等到vip缓存失效,期间VIP会有一定时间的不可用。

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &start_arp();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }

    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub start_arp() {
    `ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
复制代码
在使用上面的arping 时候,需要安装(3台都需要安装):

apt-get install iputils-arping

hash -r  更新缓存路径
关于arping 请参考:http://blog.sina.com.cn/s/blog_88b1c39501011sr1.html

◆ 邮件、短信通知:report_script=/usr/bin/send_report

脚本样本:(51-M,52-S,53-S,VIP:50)

复制代码
#!/usr/bin/env python
#-*- encoding:utf-8 -*-
#-------------------------------------------------------------------------------
# Name:        send_report.py
# Author:      zhoujy
#----------------------------------------------
import os
import sys
import time
import datetime
import smtplib
import subprocess
import fileinput
import getopt
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.Utils import COMMASPACE, formatdate

reload(sys)
sys.setdefaultencoding('utf8')

def send_mail(to, subject, text, from_mail, server="localhost"):
    message = MIMEMultipart()
    message['From'] = from_mail
    message['To'] = COMMASPACE.join(to)
    message['Date'] = formatdate(localtime=True)
    message['Subject'] = subject
    message.attach(MIMEText(text,_charset='utf-8'))
    smtp = smtplib.SMTP(server)
    smtp.sendmail(from_mail, to, message.as_string())
    smtp.close()

if __name__ == "__main__":
    opts,args = getopt.getopt(sys.argv[1:],"h",["orig_master_host=","new_master_host=","new_slave_hosts=","conf=","subject=","body=","app_vip=","new_master_ssh_port=","ssh_user="])
#    print opts,args
    for lines in opts:
        key,values = lines
        if key == '--orig_master_host':
            orig_master_host = values
        if key == '--new_master_host':
            new_master_host = values
        if key == '--new_slave_hosts':
            new_slave_hosts = values
        if key == '--subject':
            subject = values
        if key == '--body':
            body = values
#    text = sys.stdin.read()
    mail_list = ['zjy@xxx.com']
    send_mail(mail_list, subject.encode("utf8"), body, "MHA_Monitor@smtp.dxy.cn", server="192.168.220.251")
复制代码
关于getopt 参数请参考:http://blog.sina.com.cn/s/blog_88b1c39501011sr1.html

到此MySQL HA已经搭建完成,下面开始针对其进行崩溃转移测试。

◆ 测试:

Master:51,Slave:52,53,VIP:50

工具:sysbench

1:生成测试数据:

sysbench --test=oltp --mysql-table-engine=innodb --mysql-host=192.168.200.50 --mysql-db=mha_test --oltp-table-size=5000000 --mysql-user=zjy --mysql-password=123# prepare
2:在生成测试数据期间,在52上执行:关闭其IO线程,模拟延迟。53保持正常。

stop slave io_thread;
3:测试数据生成完毕之后执行:模拟主的并发等压力测试

sysbench --num-threads=16 --test=oltp --mysql-table-engine=innodb --mysql-host=192.168.200.50 --mysql-db=mha_test --oltp-table-size=5000000 --mysql-user=zjy --mysql-password=aaaaa# run
4:同时开启52的IO线程:继续同步M上的数据。

start slave io_thread;
5:关闭主库mysql进程,模拟主库发生故障,进行自动failover操作:

/etc/init.d/mysql stop
6:最后查看日志:manager.log

View Code
从上面的输出可以看出整个MHA的切换过程,共包括以下的步骤:

1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置

2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作等

3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下

4.识别含有最新更新的slave

5.应用从master保存的二进制日志事件(binlog events)

6.提升一个slave为新的master进行复制

7.使其他的slave连接新的master进行复制

      可以通过Failover Report可以看到 Failover 成功,这里需要注意的是:在51宕机之后,52(延迟很大)进行追赶的时候,虚拟IP会暂时保留在原先的地方,只要追赶上去之后VIP就会切到最新的Slave上(53)。并且进行一次切换,masterha_manager 就自动停止,需要再次手动开启才能再进行监控。上面的测试说明:延迟越久,切换也越久。即使没有延迟也至少要3s以上的切换时间,因为在M发生宕机的时候需要3s的重连验证。

更多信息:

https://code.google.com/p/mysql-master-ha/w/list

http://www.cnblogs.com/gomysql/p/3675429.html

参数:

http://wubx.net/mha-parameters/

http://isadba.com/upload/mha_Parameters.htm

来自:http://www.cnblogs.com/zhoujinyi/p/3808673.html?utm_source=tuicool&utm_medium=referral

作者:jackxiang@向东博客 专注WEB应用 构架之美 --- 构架之美,在于尽态极妍 | 应用之美,在于药到病除
地址:http://jackxiang.com/post/8757/
版权所有。转载时必须以链接形式注明作者和原始出处及本声明!


最后编辑: jackxiang 编辑于2016-6-22 15:19
评论列表
发表评论

昵称

网址

电邮

打开HTML 打开UBB 打开表情 隐藏 记住我 [登入] [注册]