Background
See also:
https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/
Symptoms
cat /proc/drbd
cat /proc/drbd --> GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:119823323 dw:119823323 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
cs:StandAlone means the node is not connected.
This should be visible on both sides.
Find out which node is active in the PCS cluster
pcs status
pcs status --> Cluster name: portal Stack: corosync Current DC: acd-store1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Last updated: Sun Mar 18 18:05:32 2018 Last change: Fri Feb 16 00:07:51 2018 by root via cibadmin on acd-store2 2 nodes configured 3 resources configured Node acd-store1: standby Online: [ acd-store2 ] Full list of resources: Resource Group: haproxy_group ClusterDataJTELSharedMount (ocf::heartbeat:Filesystem): Started acd-store2 ClusterIP (ocf::heartbeat:IPaddr2): Started acd-store2 samba (systemd:smb): Started acd-store2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
In the example above, the first node is in standby. The most important thing to check, is on which server the resources are started.
In this case, the resources are started on acd-store2.
This will therefore be defined as the NON BROKEN node.
Standby the broken node in the PCS cluster
This command can be run on either machine.
Standby broken node
pcs cluster standby acd-lb-broken --> Verify this with pcs status
On broken node
drbd on broken node
drbdadm disconnect jtelshared drbdadm secondary jtelshared drbdadm connect --discard-my-data jtelshared
On the healthy node
drbd on healthy node
drbdadm primary jtelshared drbdadm connect jtelshared
Check re-sync activity
The re-sync might take a long time.
Watch the status of this using:
cat /proc/drbd
Example output:
cat /proc/drbd
[root@storage01 ~]# cat /proc/drbd version: 8.4.10-1 (api:1/proto:86-101) GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----- ns:0 nr:1411538 dw:121234862 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:184698664 [>....................] sync'ed: 0.8% (180368/181744)M finish: 26:12:15 speed: 1,940 (2,760) want: 2,120 K/sec
Restart PCS node
Unstandby broken node
pcs cluster unstandby acd-lb-broken --> Verify this with pcs status
Put broken node back to primary
Unstandby broken node
drbdadm primary jtelshared --> Verify this with cat /proc/drbd
Check everything
Unstandby broken node
pcs resource show cat /proc/drbd # On some other linux machines ls /home/jtel/shared # Windows dir //acd-store/shared