Background
See also:
https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/
Useful Commands
Disconnect the share (useful for planned maintenance).
...
Set the node to primary:
drbdadm connect jtelshared
drbdadm primary jtelshared
Split Brain
Background
See also:
https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/
Symptoms
Code Block |
---|
|
cat /proc/drbd
-->
GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22
1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:119823323 dw:119823323 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 |
...
This should be visible on both sides.
Find out which node is active in the PCS cluster
Code Block |
---|
|
pcs status
-->
Cluster name: portal
Stack: corosync
Current DC: acd-store1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
Last updated: Sun Mar 18 18:05:32 2018
Last change: Fri Feb 16 00:07:51 2018 by root via cibadmin on acd-store2
2 nodes configured
3 resources configured
Node acd-store1: standby
Online: [ acd-store2 ]
Full list of resources:
Resource Group: haproxy_group
ClusterDataJTELSharedMount (ocf::heartbeat:Filesystem): Started acd-store2
ClusterIP (ocf::heartbeat:IPaddr2): Started acd-store2
samba (systemd:smb): Started acd-store2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled |
...
This will therefore be defined as the NON BROKEN node.
Standby the broken node in the PCS cluster
This command can be run on either machine.
Code Block |
---|
|
pcs cluster standby acd-lb-broken
--> Verify this with
pcs status |
On broken node
Code Block |
---|
|
drbdadm disconnect jtelshared
drbdadm secondary jtelshared
drbdadm connect --discard-my-data jtelshared |
On the healthy node
Code Block |
---|
title | drbd on healthy node |
---|
|
drbdadm primary jtelshared
drbdadm connect jtelshared |
Check re-sync activity
The re-sync might take a long time.
...
Code Block |
---|
|
[root@storage01 ~]# cat /proc/drbd
version: 8.4.10-1 (api:1/proto:86-101)
GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22
1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:1411538 dw:121234862 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:184698664
[>....................] sync'ed: 0.8% (180368/181744)M
finish: 26:12:15 speed: 1,940 (2,760) want: 2,120 K/sec |
Tune the transfer (Second Node)
If the transfer is going to take ages, then tune it on the broken node:
Code Block |
---|
title | drbd Transfer Tuning (on broken node) |
---|
|
drbdadm disk-options --c-plan-ahead=0 --resync-rate=110M jtelshared |
Put broken node back to primary
Code Block |
---|
title | Unstandby broken node |
---|
|
drbdadm primary jtelshared
--> Verify this with
cat /proc/drbd |
Restart PCS node
Code Block |
---|
title | Unstandby broken node |
---|
|
pcs cluster unstandby acd-lb-broken
--> Verify this with
pcs status |
Untune the transfer (Second Node)
If the transfer was tuned, then untune it (on the broken node).
...
Code Block |
---|
title | drbd - Untune Transfer |
---|
|
drbdadm adjust jtelshared |
Check everything
Code Block |
---|
title | Unstandby broken node |
---|
|
pcs status
cat /proc/drbd
# On some other linux machines
ls /home/jtel/shared
# Windows
dir //acd-store/shared |