Background

Useful Commands

Disconnect the share (useful for planned maintenance).

...

Set the node to primary:

drbdadm connect jtelshared

drbdadm primary jtelshared

Split Brain

Background

Symptoms

Code Block

title	cat /proc/drbd

cat /proc/drbd

-->

GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22
1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
 ns:0 nr:119823323 dw:119823323 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

...

This should be visible on both sides.

Find out which node is active in the PCS cluster

Code Block

title	pcs status

pcs status

-->

Cluster name: portal

Stack: corosync
Current DC: acd-store1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
Last updated: Sun Mar 18 18:05:32 2018
Last change: Fri Feb 16 00:07:51 2018 by root via cibadmin on acd-store2
2 nodes configured
3 resources configured
Node acd-store1: standby
Online: [ acd-store2 ]
Full list of resources:
Resource Group: haproxy_group
 ClusterDataJTELSharedMount (ocf::heartbeat:Filesystem): Started acd-store2
 ClusterIP (ocf::heartbeat:IPaddr2): Started acd-store2
 samba (systemd:smb): Started acd-store2
Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

...

This will therefore be defined as the NON BROKEN node.

Standby the broken node in the PCS cluster

This command can be run on either machine.

Code Block

title	Standby broken node

pcs cluster standby acd-lb-broken
 
--> Verify this with
 
pcs status

On broken node

Code Block

title	drbd on broken node

drbdadm disconnect jtelshared
drbdadm secondary jtelshared
drbdadm connect --discard-my-data jtelshared

On the healthy node

Code Block

title	drbd on healthy node

drbdadm primary jtelshared
drbdadm connect jtelshared

Check re-sync activity

The re-sync might take a long time.

...

Code Block

title	cat /proc/drbd

[root@storage01 ~]# cat /proc/drbd
version: 8.4.10-1 (api:1/proto:86-101)
GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017-09-15 14:23:22
1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
 ns:0 nr:1411538 dw:121234862 dr:2128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:184698664
 [>....................] sync'ed: 0.8% (180368/181744)M
 finish: 26:12:15 speed: 1,940 (2,760) want: 2,120 K/sec

Tune the transfer (Second Node)

If the transfer is going to take ages, then tune it on the broken node:

Code Block

title	drbd Transfer Tuning (on broken node)

drbdadm disk-options --c-plan-ahead=0 --resync-rate=110M jtelshared

Put broken node back to primary

Code Block

title	Unstandby broken node

drbdadm primary jtelshared
 
--> Verify this with
 
cat /proc/drbd

Restart PCS node

Code Block

title	Unstandby broken node

pcs cluster unstandby acd-lb-broken
 
--> Verify this with
 
pcs status

Untune the transfer (Second Node)

If the transfer was tuned, then untune it (on the broken node).

...

Code Block

title	drbd - Untune Transfer

drbdadm adjust jtelshared

Check everything

Code Block

title	Unstandby broken node

pcs status
cat /proc/drbd
# On some other linux machines
ls /home/jtel/shared
# Windows
dir //acd-store/shared

Page tree

Versions Compared

Old Version 12

New Version 13

Key

Background

Useful Commands

Split Brain

Background

Symptoms

Find out which node is active in the PCS cluster

Standby the broken node in the PCS cluster

On broken node

On the healthy node

Check re-sync activity

Tune the transfer (Second Node)

Put broken node back to primary

Restart PCS node

Untune the transfer (Second Node)

Check everything

Page tree

Page History

Versions Compared

Old Version 12

New Version 13

Key

Background

Useful Commands

Split Brain

Background

Symptoms

Find out which node is active in the PCS cluster

Standby the broken node in the PCS cluster

On broken node

On the healthy node

Check re-sync activity

Tune the transfer (Second Node)

Put broken node back to primary

Restart PCS node

Untune the transfer (Second Node)

Check everything