Shutdown/Startup Procedure - Redundancy - Controlled Failover and Failback

Do not attempt this procedure if the systems current state is unknown. Before work can begin, all cluster functions must be validated. If any part of the cluser, for example the DRBD Replication on acd-store is not functioning properly, a failover will might not work and a systemwide failure of functions could be the result.

Information can be found here: System Health Check

Introduction

A controlled fail-over might be required when one or more VM-Hosts in the cluster must be taken down for maintenance. It might then also be required to proceed with a failback to the original machines after starting the first VM-Host again, to take the second one down for maintenance.

Prerequisites

Your VM-Cluster must have at least two nodes, in which a redundant jtel cluster is running. Before shutting down, an active VM-Host and the active jtel vms within must be chosen. All activity on the jtel ACD is moved to the active VM-Host, which enables you to shutdown the inactive Host containing only standby-machines from your redundant jtel ACD cluster

A redundant jtel Cluster within your VM-Hosts may look like the example architecture from Shutdown/Startup Procedure - Large V3 - Redundant Databases + Load Balancing + Storage

Your VM-Hosts should be configured so that none of the jtel vms turn on automatically after the Host ist started. If any startup-automation is in place, it should be deactivated beforehand or configured to fit within the parameters of this procedure.

Example

Explanation

To begin shutdown procedures, the active side is chosen. In the table below the active side is VM-Host 1. We will move it to VM-Host 2 to illustrate a controlled fail-over and do a failback to VM-Host 1 afterwards.

For the purposes of illustrating the procedure in this page, acd-telN and acd-jbN are replaced with multiple machines. jtel acd-chat and acd-api servers are also active, as well as a presence-aggregator and IMAP and exchange mail connectors

Expected Normal Operation Status

This table describes the systems expected status under fully redundant operation. Redundant components are active on VM-Host 1 and on standby at VM-Host 2.

VM-Host	machine	Active
1	acd-tel1	Yes
2	acd-tel2	Yes
1	acd-jb1	Yes
1	acd-jb2	Yes
1	acd-jb3	Yes
2	acd-jb4	Yes
2	acd-jb5	Yes
2	acd-jb6	Yes
1	acd-chat1	Yes
2	acd-chat2	Yes
1	acd-api1	Yes
2	acd-api2	Yes
1	acd-dbs1/dbr1	Yes
2	acd-dbs2/dbr2	Yes
1	acd-dbm1	Yes
2	acd-dbm2	Standby
1	acd-lb1	Yes
2	acd-lb2	Standby
1	acd-store1	Yes
2	acd-store2	Standby

Shutdown

Various steps are required before the virtual machines can be shutdown.

Step 1 - Backups

At least a backup of the active database, in this case acd-dbm1 is required. If the capacity on your VM-Hosts is enough, snapshots of critical machines are also beneficial to add additional safety, but not required. The critical machines are:

acd-dbm1
acd-dbm2
acd-lb1
acd-lb2
acd-store1
acd-store2

Step 2 - Deactivate Monitoring

If monitoring is installed on your system, schedule a downtime for 2 hours for the machines on VM-Host 2. This is the approximate time this procedure will take to conclude. Extend if needed. The downtime for the machines on VM-Host 1 should be set to however long they will be inactive.

If your monitoring server is currently running on VM-Host 1, it has to be moved to a different VM-Host.

Step 3 - Shutting down all software

Machine(s)	Stop what	It is installed if you are using	How to stop
acd-tel1+2	8-Server	ACD / IVR	X the cmd file starter window. Down 8-Server via GUI. For service installations, stop the robot5 service.
	Platform UDP Listener	ACD / IVR	X the cmd file starter window. For service installations, stop the jtel Platform UDP Listener service.
	Presence Aggregator	A PBX or presence connector which uses the presence aggregator: Cisco NFON Teams	X the cmd file starter window. For service installations, stop the jtel Presence Aggregator service.
	Telephony Connector	A PBX which uses a custom connector: Avaya JTAPI Innovaphone TAPI (all)	X the cmd file starter window. For service installations, stop the service, for example the jtel TAPI service or jtel Innovaphone Service.
	Exchange Connector	E-Mail with an Exchange or Office 365 Server	Stop the jtelEWSMailService service.
	IMAP Connector	E-Mail with an IMAP(S) Server	Stop the jtelIMAPMailService service.
acd-jb1-6	Wildfly	Anything	sudo systemctl stop wildfly For installations not using systemctl: sudo service wildfly stop
acd-chat1+2	Chat Server	CHAT	sudo systemctl stop jtel-clientmessenger For installations not using systemctl: sudo service jtel-clientmessenger stop
acd-api1+2	REST API	REST	sudo systemctl stop jtelrest For installations not using systemctl: sudo service jtelrest stop
acd-dbm1+2	Platform UDP Listener	SOAP	sudo systemctl stop jtel-listener For installations not using systemctl: sudo service jtel-listener stop

Step 4 - Check for active sessions

Checking for active database-sessions on the database master is a precaution, but necessary to ensure that all services are stopped and no activity is present on the entire system.

Checking can either be done on the HaProxy admin page on acd-lb1, by checking the current session count for acd-dbm1, or it can be done on acd-dbm1 within the MySQL terminal, by typing the following command:

# The expected output should contain only replication status events
SHOW PROCESSLIST \G

Only continue when no sessions are active. If step 3 and 4 are not properly executed, this procedure might fail.

Step 5 - Manual Failover to acd-store2

To execute a manual failover, the pcs cluster node acd-store1 is temporarily set into standby, which will cause acd-store2 to become the primary node. After acd-store2 is primary, an unstandby command is executed. After this, acd-store2 will be the primary node and acd-store1 will be secondary.

Execute the following commands on acd-store1:

Note: On older pcs versions, 'node' will have to be replaced with 'cluster' in the following commands.

Example:

pcs cluster standby acd-store2

# Set acd-store1 to standby
pcs node standby acd-store1
# Check if acd-store2 was switched to primary
pcs status
# Set acd-store1 back to unstandby
pcs node unstandby acd-store1
# Check if acd-store2 is primary, and acd-store1 is secondary
pcs status

Step 6 - Manual Failover to acd-lb2

To execute a manual failover, the pcs cluster node acd-lb1 is temporarily set into standby, which will cause acd-lb2 to become the primary node. After acd-lb2 is primary, an unstandby command is executed. After this, acd-lb2 will be the primary node and acd-lb1 will be secondary.

Execute the following commands on acd-lb1:

# Set acd-lb1 to standby
pcs node standby acd-lb1
# Check if acd-lb2 was switched to primary
pcs status
# Set acd-lb1 back to unstandby
pcs node unstandby acd-lb1
# Check if acd-lb2 is primary, and acd-lb2 is secondary
pcs status

Step 7 - Configure HaProxy

Access the HaProxy admin page for both acd-lb1 and acd-lb2. The primarily important machine to configure is acd-lb2, but doing it on both HaProxys will ensure that the configuration is the same in case an accidental fail-over happens before the machines on VM-Host 1 are shutdown.

Configure the status "MAINT" for all machines on VM-Host 1

When the machine, in this case acd-lb1 is booted and the HaProxy on acd-lb1 starts again, all machines will reconnect with the cluster and be set to the status "READY" by default.

Step 8 - Check the AcdGroupDistribute Daemon

If the Daemon AcdGroupDistribute.r5 is running on acd-tel1, it must be started on acd-tel2 to ensure that calls to acd-groups will still be routed by the routing-algorithm.

Step 9 - Shutdown the virtual machines on VM-Host 1

The correct order in which to shutdown must still be maintained. The following table displays the order

First shutdown acd-tel1, acd-jb1-3 as well as acd-chat1 and acd-api1 in no particular order. You do not have to wait until acd-tel1, acd-jb1-3 as well as acd-chat1 and acd-api1 are down before shutting acd-dbs1/dbr1 down. Wait until acd-dbs1/dbr1 is down until shutting acd-dbm1 down. Wait until acd-dbm1 is down before shutting down acd-lb1. Wait until acd-lb1 is down before shutting down acd-store1:

VM-Host	Steps 1 to N	machine
1	1	acd-tel1
1	2	acd-jb1
1	3	acd-jb2
1	4	acd-jb3
1	5	acd-chat1
1	6	acd-api1
1	7	acd-dbs1/dbr1
1	8	acd-dbm1
1	9	acd-lb1
1	10	acd-store1

Step 10 - Check the cluster status

# acd-store2 -> The DRBD will not be synchronized/disconnected | acd-store2 should be primary, and acd-store1 should be offline
pcs status
# acd-lb2 -> acd-lb2 should be primary, and acd-lb1 should be offline
pcs status
# acd-dbm2 -> The replication to acd-dbm1 should be in status "connecting"
SHOW SLAVE STATUS \G

Step 10 - Start Software on all machines on VM-Host 2

Machine(s)	Start what	It is installed if you are using	How to stop
acd-tel2	8-Server	ACD / IVR	Explorer to shell:startup - start the link to startup_launcher.cmd For service installations, start the robot5 service.
	Platform UDP Listener	ACD / IVR	Explorer to shell:startup - start the link to startListener.bat For service installations, start the jtel Platform UDP Listener service.
	Presence Aggregator	A PBX or presence connector which uses the presence aggregator: Cisco NFON Teams	Explorer to shell:startup - start the link to start-presence-aggregator.cmd For service installations, start the jtel Presence Aggregator service.
	Telephony Connector	A PBX which uses a custom connector: Avaya JTAPI Innovaphone TAPI (all)	Explorer to shell:startup - start the link to JTELInnovaphonePBXService.exe or jtelTAPIMonitorService.exe For service installations, start the service, for example the jtel TAPI service or jtel Innovaphone Service.
	Exchange Connector	E-Mail with an Exchange or Office 365 Server	Start the jtelEWSMailService service.
	IMAP Connector	E-Mail with an IMAP(S) Server	Start the jtelIMAPMailService service.
acd-jb4-6	Wildfly	Anything	sudo systemctl start wildfly For installations not using systemctl: sudo service wildfly start
acd-chat2	Chat Server	CHAT	sudo systemctl start jtel-clientmessenger For installations not using systemctl: sudo service jtel-clientmessenger start
acd-api2	REST API	REST	sudo systemctl start jtelrest For installations not using systemctl: sudo service jtelrest start
acd-dbm2	Platform UDP Listener	SOAP	sudo systemctl start jtel-listener For installations not using systemctl: sudo service jtel-listener start

Step 11 - Ensure system functionality

Information can be found here: System Health Check

Step 12 - Reactivate monitoring

The monitoring is now reactivated on all machines on VM-Host 2

If all tests are successful, the system is now running only on VM Host 2 and fully operational.

Startup

The following steps assume that VM Host 1 has been started, and all jtel VMs on the server stayed turned off. If the jtel vms were mistakenly automatically turned on, problems in the cluster might have occured as a result, and a different may be required.

This part of the procedure is not designed to be standalone and directly correlates to the previous shutdown steps above, as it is part of the example.

Step 1 - Backups

Another backup of the database is created. This time acd-dbm2 is the active database.

Step 2 - Deactivate Monitoring

Schedule a downtime for the monitoring on all machines for approximately 2 hours. Extend if needed.

Step 3 - Stop all software on all machines on VM-Host 2

Machine(s)	Stop what	It is installed if you are using	How to stop
acd-tel2	8-Server	ACD / IVR	X the cmd file starter window. Down 8-Server via GUI. For service installations, stop the robot5 service.
	Platform UDP Listener	ACD / IVR	X the cmd file starter window. For service installations, stop the jtel Platform UDP Listener service.
	Presence Aggregator	A PBX or presence connector which uses the presence aggregator: Cisco NFON Teams	X the cmd file starter window. For service installations, stop the jtel Presence Aggregator service.
	Telephony Connector	A PBX which uses a custom connector: Avaya JTAPI Innovaphone TAPI (all)	X the cmd file starter window. For service installations, stop the service, for example the jtel TAPI service or jtel Innovaphone Service.
	Exchange Connector	E-Mail with an Exchange or Office 365 Server	Stop the jtelEWSMailService service.
	IMAP Connector	E-Mail with an IMAP(S) Server	Stop the jtelIMAPMailService service.
acd-jb4-6	Wildfly	Anything	sudo systemctl stop wildfly For installations not using systemctl: sudo service wildfly stop
acd-chat2	Chat Server	CHAT	sudo systemctl stop jtel-clientmessenger For installations not using systemctl: sudo service jtel-clientmessenger stop
acd-api2	REST API	REST	sudo systemctl stop jtelrest For installations not using systemctl: sudo service jtelrest stop
acd-dbm2	Platform UDP Listener	SOAP	sudo systemctl stop jtel-listener For installations not using systemctl: sudo service jtel-listener stop

Step 4 - Check for active sessions

Checking for active database-sessions on the database master is a precaution, but necessary to ensure that all services are stopped and no activity is present on the entire system.

Checking can either be done on the HaProxy admin page on acd-lb2, by checking the current session activity on acd-dbm2, or it can be done on acd-dbm2 within the MySQL terminal, by typing the following command:

# The expected output contains only replication status events
SHOW PROCESSLIST \G

Only continue when no sessions are active. If step 3 and 4 are not properly executed, this procedure might fail.

Step 5 - Start all virtual machines on VM-Host 1

The correct order in which to start must be maintained. The following table displays the order

First start acd-store1 and wait until it is up. Start acd-lb1 and wait until it is up. Start acd-dbm1 and wait until it is up. Start acd-dbs1/dbr1 and wait until it is up. Afterwards, start acd-jb1-3, acd-tel1 as well as acd-chat1 and acd-api1 in no particular order.

VM-Host	Steps 1 to N	machine
1	1	acd-store1
1	2	acd-lb1
1	3	acd-dbm1
1	4	acd-dbs1/dbr1
1	5	acd-chat1
1	6	acd-api1
1	7	acd-jb1
1	8	acd-jb2
1	9	acd-jb3
1	10	acd-tel1

Step 6 - Stop all software on virtual machines on VM-Host 1

Machine(s)	Stop what	It is installed if you are using	How to stop
acd-tel1	8-Server	ACD / IVR	X the cmd file starter window. Down 8-Server via GUI. For service installations, stop the robot5 service.
	Platform UDP Listener	ACD / IVR	X the cmd file starter window. For service installations, stop the jtel Platform UDP Listener service.
	Presence Aggregator	A PBX or presence connector which uses the presence aggregator: Cisco NFON Teams	X the cmd file starter window. For service installations, stop the jtel Presence Aggregator service.
	Telephony Connector	A PBX which uses a custom connector: Avaya JTAPI Innovaphone TAPI (all)	X the cmd file starter window. For service installations, stop the service, for example the jtel TAPI service or jtel Innovaphone Service.
	Exchange Connector	E-Mail with an Exchange or Office 365 Server	Stop the jtelEWSMailService service.
	IMAP Connector	E-Mail with an IMAP(S) Server	Stop the jtelIMAPMailService service.
acd-jb1-3	Wildfly	Anything	sudo systemctl stop wildfly For installations not using systemctl: sudo service wildfly stop
acd-chat1	Chat Server	CHAT	sudo systemctl stop jtel-clientmessenger For installations not using systemctl: sudo service jtel-clientmessenger stop
acd-api1	REST API	REST	sudo systemctl stop jtelrest For installations not using systemctl: sudo service jtelrest stop
acd-dbm1	Platform UDP Listener	SOAP	sudo systemctl stop jtel-listener For installations not using systemctl: sudo service jtel-listener stop

Step 7 - Check for active sessions

Checking for active database-sessions on the database master is a precaution, but necessary to ensure that all services are stopped and no activity is present on the entire system.

Checking can either be done on the HaProxy admin page on acd-lb2, by checking the current session activity on acd-dbm2, or it can be done on acd-dbm2 within the MySQL terminal, by typing the following command:

# The expected output contains only replication status events
SHOW PROCESSLIST \G

Only continue when no sessions are active. If this step is not properly executed, this procedure will fail.

Step 7 - Check cluster

To prepare for the failback, the cluster needs to be checked after booting up. Execute the following commands to check the status:

# acd-store1 -> The DRBD should be synchronized | acd-store2 should still be primary, and acd-store1 should be secondary
# acd-store2 -> The DRBD should be synchronized | acd-store2 should still be primary, and acd-store1 should be secondary
cat /proc/drbd
pcs status
# acd-lb1 -> acd-lb2 should still be primary, and acd-lb1 should be secondary
# acd-lb2 -> acd-lb2 should still be primary, and acd-lb1 should be secondary
pcs status
# acd-dbm1 -> The replication should be started and synchronized
SHOW SLAVE STATUS \G
# acd-dbm2 -> The replication should be started and synchronized 
SHOW SLAVE STATUS \G
# acd-dbs1 -> The replication should be started and synchronized 
SHOW SLAVE STATUS \G
# acd-dbs2 -> The replication should be started and synchronized 
SHOW SLAVE STATUS \G

Do not continue with the next steps if the above status is not yet reached. The sites below may help with troubleshooting any issues.

For problems with DRBD, visit the following sites

DRBD - Maintenance and Resolve Split Brain or Node Errors

For problems with database replication, visit the following sites

Restore MySQL Master-Master Replication

Restore MySQL Master Slave Replication

Other helpful sites

Normal operation

Step 5 - Manual Failback to acd-store1

To execute a manual failback , the pcs cluster node acd-store2 is temporarily set into standby, which will cause acd-store1 to become the primary node. After acd-store1 is primary, an unstandby command is executed. After this, acd-store1 will be the primary node and acd-store2 will be secondary.

Execute the following commands on acd-store2:

Note: On older pcs versions, 'node' will have to be replaced with 'cluster' in the following commands.

Example:

pcs cluster standby acd-store2

# Set acd-store2 to standby
pcs node standby acd-store2
# Check if acd-store1 was switched to primary
pcs status
# Set acd-store2 back to unstandby
pcs node unstandby acd-store2
# Check if acd-store1 is primary, and acd-store2 is secondary
pcs status

Step 6 - Manual Failback to acd-lb1

To execute a manual failback, the pcs cluster node acd-lb2 is temporarily set into standby, which will cause acd-lb1 to become the primary node. After acd-lb1 is primary, an unstandby command is executed. After this, acd-lb1 will be the primary node and acd-lb2 will be secondary.

Execute the following commands on acd-lb2:

# Set acd-lb2 to standby
pcs node standby acd-lb2
# Check if acd-lb1 was switched to primary
pcs status
# Set acd-lb2 back to unstandby
pcs node unstandby acd-lb2
# Check if acd-lb1 is primary, and acd-lb2 is secondary
pcs status

Step 7 - Check status on acd-lb1 and Configure HaProxy on acd-lb2

Access the HaProxy admin page for acd-lb1. The status after the failback should be exactly as you found it before starting the Shutdown procedure at Step 4.

If the above described state is not the current state, something went wrong. Please refer to the troubleshooting pages named above.

Access the HaProxy admin page for acd-lb2. The configuration will only be done on acd-lb2, because the status of all cluster members on VM-Host 1 was set to default after the boot.

On acd-lb2, remove the status "MAINT" for all machines on VM-Host 1 and set them to "READY". Ensure the correct status before continuing.

Be sure that the cluster is 100% operational before continuing with step 8. If any members of the cluster are not functioning properly and the software is started, problems will occur.

Step 8 - Start Software on all machines

Machine(s)	Start what	It is installed if you are using	How to stop
acd-tel1+2	8-Server	ACD / IVR	Explorer to shell:startup - start the link to startup_launcher.cmd For service installations, start the robot5 service.
	Platform UDP Listener	ACD / IVR	Explorer to shell:startup - start the link to startListener.bat For service installations, start the jtel Platform UDP Listener service.
	Presence Aggregator	A PBX or presence connector which uses the presence aggregator: Cisco NFON Teams	Explorer to shell:startup - start the link to start-presence-aggregator.cmd For service installations, start the jtel Presence Aggregator service.
	Telephony Connector	A PBX which uses a custom connector: Avaya JTAPI Innovaphone TAPI (all)	Explorer to shell:startup - start the link to JTELInnovaphonePBXService.exe or jtelTAPIMonitorService.exe For service installations, start the service, for example the jtel TAPI service or jtel Innovaphone Service.
	Exchange Connector	E-Mail with an Exchange or Office 365 Server	Start the jtelEWSMailService service.
	IMAP Connector	E-Mail with an IMAP(S) Server	Start the jtelIMAPMailService service.
acd-jb1-6	Wildfly	Anything	sudo systemctl start wildfly For installations not using systemctl: sudo service wildfly start
acd-chat1+2	Chat Server	CHAT	sudo systemctl start jtel-clientmessenger For installations not using systemctl: sudo service jtel-clientmessenger start
acd-api1+2	REST API	REST	sudo systemctl start jtelrest For installations not using systemctl: sudo service jtelrest start
acd-dbm1+2	Platform UDP Listener	SOAP	sudo systemctl start jtel-listener For installations not using systemctl: sudo service jtel-listener start

Step 9 - Stop the AcdGroupDistribute Daemon on acd-tel2

Since acd-tel1 is active again, if the Daemon AcdGroupDistribute.r5 is was running on acd-tel1 before, it must be stopped on acd-tel2 to ensure that calls to acd-groups will still be routed properly by the routing-algorithm.

Optional: The daemon can also be kept on acd-tel2, since it will operate the same way independent of where it is running.

Step 11 - Reactivate monitoring

The monitoring is now reactivated for all machines

Step 12 - Ensure system functionality

Information can be found here: System Health Check

If all tests are successful, the system is now running only on VM Host 2 and fully operational.

Step 13 - Remove backups

Backups or snapshots can now be deleted

Page tree

Shutdown/Startup Procedure - Redundancy - Controlled Failover and Failback