Sv translation | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CautionThis is an advanced topic. Use at your own risk and ALWAYS backup your data before. Useful CommandsView DRBD Status
Reload all parameters
Disconnect the share (useful for planned maintenance)
Down the share (useful for planned maintenance)
Up the share
Set the node to primary
Connect the share
Split BrainBackgroundSee also: https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/ Symptoms
cs:StandAlone means the node is not connected. This should be visible on both sides. Find out which node is active in the PCS cluster
In the example above, the first node is in standby. The most important thing to check, is on which server the resources are started. In this case, the resources are started on acd-store2. This will therefore be defined as the NON BROKEN node. Standby the broken node in the PCS clusterThis command can be run on either machine.
On broken nodeNote: the first command will probably throw an error. Also, the share may not be mounted. This is OK.
On the healthy node
Check re-sync activityThe re-sync might take a long time. Watch the status of this using: cat /proc/drbd Example output:
Tune the transfer (Second Node)If the transfer is going to take ages, then tune it on the broken node:
Put broken node back to primary
Restart PCS node
Untune the transfer (Second Node)If the transfer was tuned, then untune it (on the broken node). Note: it won't hurt to run this command anyway.
Check everything
File System CorruptSometimes, when DRBD fails, the file system will also become corrupt. In this case both nodes might be primary, however neither will have the share mounted. The command mount /srv/jtel/shared will fail. In this case, it may be necessary to repair the file system. Symptoms
RepairingOne one of the nodes (need to choose one to become primary):
This should then mount and start the resources on that node. Then proceed with the other node as "broken" in the split brain situation. Stalled ResyncIf the DRBD resync stalls - the output will be "stalled" when cat /proc/drbd is executed - then it may be necessary to restart the machine. This has been observed once, and restarting resolved the situation. However not much more is known about this state, or the cause, at this time. |
Sv translation | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CautionThis is an advanced topic. Use at your own risk and ALWAYS backup your data before. Useful CommandsView DRBD Status Translations Ignore |
Reload all parameters Translations Ignore |
Disconnect the share (useful for planned maintenance) Translations Ignore |
Down the share (useful for planned maintenance) Translations Ignore |
Up the share Translations Ignore |
Set the node to primary Translations Ignore |
Connect the share Translations Ignore |
Split BrainBackgroundSee also: https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/ Symptoms Translations Ignore |
cs:StandAlone means the node is not connected. This should be visible on both sides. Find out which node is active in the PCS cluster Translations Ignore |
In the example above, the first node is in standby. The most important thing to check, is on which server the resources are started. In this case, the resources are started on acd-store2. This will therefore be defined as the NON BROKEN node. Standby the broken node in the PCS clusterThis command can be run on either machine. Translations Ignore |
On broken nodeNote: the first command will probably throw an error. Also, the share may not be mounted. This is OK. Translations Ignore |
On the healthy node Translations Ignore |
Check re-sync activityThe re-sync might take a long time. Watch the status of this using: cat /proc/drbd Example output: Translations Ignore |
Tune the transfer (Second Node)If the transfer is going to take ages, then tune it on the broken node: Translations Ignore |
Put broken node back to primary Translations Ignore |
Restart PCS node Translations Ignore |
Untune the transfer (Second Node)If the transfer was tuned, then untune it (on the broken node). Note: it won't hurt to run this command anyway. Translations Ignore |
Check everything Translations Ignore |
File System CorruptSometimes, when DRBD fails, the file system will also become corrupt. In this case both nodes might be primary, however neither will have the share mounted. The command mount /srv/jtel/shared will fail. In this case, it may be necessary to repair the file system. Symptoms Translations Ignore |
RepairingOne one of the nodes (need to choose one to become primary): Translations Ignore |
This should then mount and start the resources on that node. Then proceed with the other node as "broken" in the split brain situation. Stalled ResyncIf the DRBD resync stalls - the output will be "stalled" when cat /proc/drbd is executed - then it may be necessary to restart the machine. This has been observed once, and restarting resolved the situation. However not much more is known about this state, or the cause, at this time. |