MySQL Master/Slave Management with mysql-master-ha


I decided to take care of my blog. I lost my previous blog backup. How feckless I am! So, I decided to write my first post about High Availability.. ;)
At first sight, you may say “I can do master/slave configuration with 4-5 easy step.” Then you may ask “What is the benefit of mysql-master-ha? ”

Developer of mysql-master-ha says;
“A primary objective of MHA is automating master failover and slave promotion within short (usually 10-30 seconds) downtime, without suffering from replication consistency problems, without spending money for lots of new servers, without performance penalty, without complexity (easy-to-install), and without changing existing deployments.”

Also some featured topics about it.
* Automated master monitoring and failover
* Interactive (manual) Master Failover
* Non-interactive master failover
* Online switching master to a different host

Sounds nice , right?
You may find HowTos, FAQ, Guides etc at mysql-master-ha site. Now, I will show you in practice example of it.

I assume that you have successfully setup your master/slave configuration.

MySQL HA Manager
Ubuntu Server 12.04 – Default setup( VM )
IP Address 192.168.99.103

MySQL Node1
Ubuntu Server 12.04 – Default setup( VM )
IP Address 192.168.99.101
MySQL Server 5.5

MySQL Node2
Ubuntu Server 12.04 – Default Setup ( VM )
IP Address 192.168.99.102
MySQL Server 5.5

Installation
STEP 1
first install dependencies
apt-get install libdbd-mysql-perl
apt-get install libconfig-tiny-perl
apt-get install liblog-dispatch-perl
apt-get install libparallel-forkmanager-perl

then
cd /tmp
wget http://mysql-master-ha.googlecode.com/files/mha4mysql-manager_0.53_all.deb
wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node_0.53_all.deb
dpkg -i mha4mysql-node_0.53_all.deb ( I dont know why manager needs node package. otherwise gives error. )
dpkg -i mha4mysql-manager_0.53_all.deb

STEP 2
Now if everything is fine we can setup ha manager configuration. you may have 10s of master/slave configurations. Each configuration is set under
/etc/app1.cnf
/etc/app2.cnf
/etc/app3.cnf
..
.
.

We have only one setup with 2 two servers. so edit /etc/app1.cnf with your favorite editor.
before you edit be sure that you have the directories created.

[server default]
# mysql user and password
user=root
password=1q
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# manager log file
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

[server1]
hostname=192.168.99.101

[server2]
hostname=192.168.99.102

If you have created ssh keys skip this step Otherwise you have you create ssh keys and copy them to all nodes.
ssh-keygen -t dsa -f ~/.ssh/id_dsa -N “”
cp ~/.ssh/id_dsa.pub ~/.ssh/manager_keys
scp -r ~/.ssh “node IP address”: ( without quotes )

Now check ssh is working properly.

root@Manager:~# masterha_check_ssh –conf=/etc/app1.cnf
Wed Sep 26 09:42:33 2012 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Sep 26 09:42:33 2012 – [info] Reading application default configurations from /etc/app1.cnf..
Wed Sep 26 09:42:33 2012 – [info] Reading server configurations from /etc/app1.cnf..
Wed Sep 26 09:42:33 2012 – [info] Starting SSH connection tests..
Wed Sep 26 09:42:40 2012 – [debug]
Wed Sep 26 09:42:33 2012 – [debug] Connecting via SSH from root@192.168.99.101(192.168.99.101:22) to root@192.168.99.102(192.168.99.102:22)..
Wed Sep 26 09:42:40 2012 – [debug] ok.
Wed Sep 26 09:42:40 2012 – [debug]
Wed Sep 26 09:42:34 2012 – [debug] Connecting via SSH from root@192.168.99.102(192.168.99.102:22) to root@192.168.99.101(192.168.99.101:22)..
Wed Sep 26 09:42:40 2012 – [debug] ok.
Wed Sep 26 09:42:40 2012 – [info] All SSH connection tests passed successfully.

Now We should test if the replication i ok or not

root@Manager:~# masterha_check_repl –conf=/etc/app1.cnf
Wed Sep 26 10:11:44 2012 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Sep 26 10:11:44 2012 – [info] Reading application default configurations from /etc/app1.cnf..
Wed Sep 26 10:11:44 2012 – [info] Reading server configurations from /etc/app1.cnf..
Wed Sep 26 10:11:44 2012 – [info] MHA::MasterMonitor version 0.53.
Wed Sep 26 10:11:44 2012 – [info] Dead Servers:
Wed Sep 26 10:11:44 2012 – [info] Alive Servers:
Wed Sep 26 10:11:44 2012 – [info] 192.168.99.101(192.168.99.101:3306)
Wed Sep 26 10:11:44 2012 – [info] 192.168.99.102(192.168.99.102:3306)
Wed Sep 26 10:11:44 2012 – [info] Alive Slaves:
Wed Sep 26 10:11:44 2012 – [info] 192.168.99.102(192.168.99.102:3306) Version=5.5.24-0ubuntu0.12.04.1-log (oldest major version between slaves) log-bin:enabled
Wed Sep 26 10:11:44 2012 – [info] Replicating from 192.168.99.101(192.168.99.101:3306)
Wed Sep 26 10:11:44 2012 – [info] Current Alive Master: 192.168.99.101(192.168.99.101:3306)
Wed Sep 26 10:11:44 2012 – [info] Checking slave configurations..
Wed Sep 26 10:11:44 2012 – [info] Checking replication filtering settings..
Wed Sep 26 10:11:44 2012 – [info] binlog_do_db= test, binlog_ignore_db=
Wed Sep 26 10:11:44 2012 – [info] Replication filtering check ok.
Wed Sep 26 10:11:44 2012 – [info] Starting SSH connection tests..
Wed Sep 26 10:11:53 2012 – [info] All SSH connection tests passed successfully.
Wed Sep 26 10:11:53 2012 – [info] Checking MHA Node version..
Wed Sep 26 10:11:56 2012 – [info] Version check ok.
Wed Sep 26 10:11:56 2012 – [info] Checking SSH publickey authentication settings on the current master..
Wed Sep 26 10:11:58 2012 – [info] HealthCheck: SSH to 192.168.99.101 is reachable.
Wed Sep 26 10:12:01 2012 – [info] Master MHA Node version is 0.53.
Wed Sep 26 10:12:01 2012 – [info] Checking recovery script configurations on the current master..
Wed Sep 26 10:12:01 2012 – [info] Executing command: save_binary_logs –command=test –start_pos=4 –binlog_dir=/var/lib/mysql,/var/log/mysql –output_file=/var/log/masterha/app1/save_binary_logs_test –manager_version=0.53 –start_file=mysql-bin.000006
Wed Sep 26 10:12:01 2012 – [info] Connecting to root@192.168.99.101(192.168.99.101)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/log/mysql, up to mysql-bin.000006
Wed Sep 26 10:12:03 2012 – [info] Master setting check done.
Wed Sep 26 10:12:03 2012 – [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Sep 26 10:12:03 2012 – [info] Executing command : apply_diff_relay_logs –command=test –slave_user=root –slave_host=192.168.99.102 –slave_ip=192.168.99.102 –slave_port=3306 –workdir=/var/log/masterha/app1 –target_version=5.5.24-0ubuntu0.12.04.1-log –manager_version=0.53 –relay_log_info=/var/lib/mysql/relay-log.info –relay_dir=/var/lib/mysql/ –slave_pass=xxx
Wed Sep 26 10:12:03 2012 – [info] Connecting to root@192.168.99.102(192.168.99.102:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info … ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000012
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000012
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Sep 26 10:12:06 2012 – [info] Slaves settings check done.
Wed Sep 26 10:12:06 2012 – [info]
192.168.99.101 (current master)
+–192.168.99.102

Wed Sep 26 10:12:06 2012 – [info] Checking replication health on 192.168.99.102..
Wed Sep 26 10:12:06 2012 – [info] ok.
Wed Sep 26 10:12:06 2012 – [warning] master_ip_failover_script is not defined.
Wed Sep 26 10:12:06 2012 – [warning] shutdown_script is not defined.
Wed Sep 26 10:12:06 2012 – [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

Now setup seems ok.
run as deamon
nohup masterha_manager –conf=/etc/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

now masterha is monitoring mysql replication with default options.
lets test what happens if master is down
on manager host
tail -f /var/log/masterha/app1/app1.log

now go to master host and stop mysql

Now lets see what happens on masterha side..
( I am skipping the debug logs.. just show you if everything is ok or not

Wed Sep 26 11:38:04 2012 – [info] scp from root@192.168.99.101:/var/log/masterha/app1/saved_master_binlog_from_192.168.99.101_3306_20120926113752.binlog to local:/var/$
Wed Sep 26 11:38:07 2012 – [info] HealthCheck: SSH to 192.168.99.102 is reachable.
Wed Sep 26 11:38:10 2012 – [info]
Wed Sep 26 11:38:10 2012 – [info] * Phase 3.3: Determining New Master Phase..
Wed Sep 26 11:38:10 2012 – [info]
Wed Sep 26 11:38:10 2012 – [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Sep 26 11:38:10 2012 – [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Sep 26 11:38:10 2012 – [info] Searching new master from slaves..
Wed Sep 26 11:38:10 2012 – [info] Candidate masters from the configuration file:
Wed Sep 26 11:38:10 2012 – [info] Non-candidate masters:
Wed Sep 26 11:38:10 2012 – [info] New master is 192.168.99.102(192.168.99.102:3306)
Wed Sep 26 11:38:10 2012 – [info] Starting master failover..
Wed Sep 26 11:38:10 2012 – [info]
From:
192.168.99.101 (current master)
+–192.168.99.102
To:
192.168.99.102 (new master)

Wed Sep 26 11:38:10 2012 – [info]
Wed Sep 26 11:38:10 2012 – [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Sep 26 11:38:10 2012 – [info]
Wed Sep 26 11:38:10 2012 – [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Sep 26 11:38:10 2012 – [info] Sending binlog..
Wed Sep 26 11:38:16 2012 – [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.99.101_3306_20120926113752.binlog to root@192.168.99.102:/var/$
Wed Sep 26 11:38:16 2012 – [info]
Wed Sep 26 11:38:16 2012 – [info] * Phase 3.4: Master Log Apply Phase..
Wed Sep 26 11:38:16 2012 – [info]
Wed Sep 26 11:38:16 2012 – [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Sep 26 11:38:16 2012 – [info] Starting recovery on 192.168.99.102(192.168.99.102:3306)..
Wed Sep 26 11:38:16 2012 – [info] Generating diffs succeeded.
Wed Sep 26 11:38:16 2012 – [info] Waiting until all relay logs are applied.
Wed Sep 26 11:38:16 2012 – [info] done.
Wed Sep 26 11:38:16 2012 – [info] Getting slave status..
Wed Sep 26 11:38:16 2012 – [info] This slave(192.168.99.102)’s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000011:107). No need to recover from Exec_Ma$
Wed Sep 26 11:38:16 2012 – [info] Connecting to the target slave host 192.168.99.102, running recover script..
Wed Sep 26 11:38:16 2012 – [info] Executing command: apply_diff_relay_logs –command=apply –slave_user=root –slave_host=192.168.99.102 –slave_ip=192.168.99.102 –s$
Wed Sep 26 11:38:19 2012 – [info]
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_192.168.99.101_3306_20120926113752.binlog on 192.168.99.102:3306. This may$
Applying log files succeeded.
Wed Sep 26 11:38:19 2012 – [info] All relay logs were successfully applied.
Wed Sep 26 11:38:19 2012 – [info] Getting new master’s binlog name and position..
Wed Sep 26 11:38:19 2012 – [info] mysql-bin.000003:281
Wed Sep 26 11:38:19 2012 – [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=’192.168.99.102′, MASTER_PORT$
Wed Sep 26 11:38:19 2012 – [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Sep 26 11:38:19 2012 – [info] ** Finished master recovery successfully.
Wed Sep 26 11:38:19 2012 – [info] * Phase 3: Master Recovery Phase completed.
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] * Phase 4: Slaves Recovery Phase..
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] Generating relay diff files from the latest slave succeeded.
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] All new slave servers recovered successfully.
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] * Phase 5: New master cleanup phease..
Wed Sep 26 11:38:19 2012 – [info]
Wed Sep 26 11:38:19 2012 – [info] Resetting slave info on the new master..
Wed Sep 26 11:38:19 2012 – [info] 192.168.99.102: Resetting slave info succeeded.
Wed Sep 26 11:38:19 2012 – [info] Master failover to 192.168.99.102(192.168.99.102:3306) completed successfully.
Wed Sep 26 11:38:19 2012 – [info]

—– Failover Report —–

app1: MySQL Master failover 192.168.99.101 to 192.168.99.102 succeeded

Master 192.168.99.101 is down!

Check MHA Manager logs at Manager:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.99.102(192.168.99.102:3306) has all relay logs for recovery.
Selected 192.168.99.102 as a new master.
192.168.99.102: OK: Applying all logs succeeded.
Generating relay diff files from the latest slave succeeded.
192.168.99.102: Resetting slave info succeeded.
Master failover to 192.168.99.102(192.168.99.102:3306) completed successfully.

woo everthing is done in seconds. our slave is master now. it was done automaticially with no data loss.

Now let us swtich master manually.
I started my old master and made it slave. now I want to switch it back as master again. This process will be online with no data loss too.

root@Manager:/var/log/masterha/app1# masterha_master_switch –master_state=alive –conf=/etc/app1.cnf –new_master_host=192.168.99.101
Wed Sep 26 13:49:23 2012 – [info] MHA::MasterRotate version 0.53.
Wed Sep 26 13:49:23 2012 – [info] Starting online master switch..
Wed Sep 26 13:49:23 2012 – [info]
Wed Sep 26 13:49:23 2012 – [info] * Phase 1: Configuration Check Phase..
Wed Sep 26 13:49:23 2012 – [info]
Wed Sep 26 13:49:23 2012 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Sep 26 13:49:23 2012 – [info] Reading application default configurations from /etc/app1.cnf..
Wed Sep 26 13:49:23 2012 – [info] Reading server configurations from /etc/app1.cnf..
Wed Sep 26 13:49:23 2012 – [info] Current Alive Master: 192.168.99.102(192.168.99.102:3306)
Wed Sep 26 13:49:23 2012 – [info] Alive Slaves:
Wed Sep 26 13:49:23 2012 – [info] 192.168.99.101(192.168.99.101:3306) Version=5.5.24-0ubuntu0.12.04.1-log (oldest major version between slaves) log-bin:enabled
Wed Sep 26 13:49:23 2012 – [info] Replicating from 192.168.99.102(192.168.99.102:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.99.102(192.168.99.102:3306)? (YES/no): YES
Wed Sep 26 13:49:26 2012 – [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed Sep 26 13:49:26 2012 – [info] ok.
Wed Sep 26 13:49:26 2012 – [info] Checking MHA is not monitoring or doing failover..
Wed Sep 26 13:49:26 2012 – [info] Checking replication health on 192.168.99.101..
Wed Sep 26 13:49:26 2012 – [info] ok.
Wed Sep 26 13:49:26 2012 – [info] 192.168.99.101 can be new master.
Wed Sep 26 13:49:26 2012 – [info]
From:
192.168.99.102 (current master)
+–192.168.99.101

To:
192.168.99.101 (new master)

Starting master switch from 192.168.99.102(192.168.99.102:3306) to 192.168.99.101(192.168.99.101:3306)? (yes/NO): yes
Wed Sep 26 13:49:39 2012 – [info] Checking whether 192.168.99.101(192.168.99.101:3306) is ok for the new master..
Wed Sep 26 13:49:39 2012 – [info] ok.
Wed Sep 26 13:49:39 2012 – [info] ** Phase 1: Configuration Check Phase completed.
Wed Sep 26 13:49:39 2012 – [info]
Wed Sep 26 13:49:39 2012 – [info] * Phase 2: Rejecting updates Phase..
Wed Sep 26 13:49:39 2012 – [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Wed Sep 26 13:49:42 2012 – [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed Sep 26 13:49:42 2012 – [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Sep 26 13:49:42 2012 – [info] ok.
Wed Sep 26 13:49:42 2012 – [info] Orig master binlog:pos is mysql-bin.000003:455.
Wed Sep 26 13:49:42 2012 – [info] Waiting to execute all relay logs on 192.168.99.101(192.168.99.101:3306)..
Wed Sep 26 13:49:42 2012 – [info] master_pos_wait(mysql-bin.000003:455) completed on 192.168.99.101(192.168.99.101:3306). Executed 0 events.
Wed Sep 26 13:49:42 2012 – [info] done.
Wed Sep 26 13:49:42 2012 – [info] Getting new master’s binlog name and position..
Wed Sep 26 13:49:42 2012 – [info] mysql-bin.000013:107
Wed Sep 26 13:49:42 2012 – [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=’192.168.99.101′, MASTER_PORT=3306, MASTER_LOG_FILE=’mysql-bin.000013′, MASTER_LOG_POS=107, MASTER_USER=’root’, MASTER_PASSWORD=’xxx’;
Wed Sep 26 13:49:42 2012 – [info]
Wed Sep 26 13:49:42 2012 – [info] * Switching slaves in parallel..
Wed Sep 26 13:49:42 2012 – [info]
Wed Sep 26 13:49:42 2012 – [info] Unlocking all tables on the orig master:
Wed Sep 26 13:49:42 2012 – [info] Executing UNLOCK TABLES..
Wed Sep 26 13:49:42 2012 – [info] ok.
Wed Sep 26 13:49:42 2012 – [info] All new slave servers switched successfully.
Wed Sep 26 13:49:42 2012 – [info]
Wed Sep 26 13:49:42 2012 – [info] * Phase 5: New master cleanup phease..
Wed Sep 26 13:49:42 2012 – [info]
Wed Sep 26 13:49:42 2012 – [info] 192.168.99.101: Resetting slave info succeeded.
Wed Sep 26 13:49:42 2012 – [info] Switching master to 192.168.99.101(192.168.99.101:3306) completed successfully.

Thats all.. feel free to comment your problems..

1 thought on “MySQL Master/Slave Management with mysql-master-ha”

Leave a Reply