Node Health Check¶
Resources¶
Installation & configuration¶
- Install Operator "Node Health Check Operator"
- Install Operator "Self Node Remediation Operator"
Start operator for worker nodes¶
- Change the seconds to achieve the fasted VM recovery, according to OpenShift Virtualization - Fencing and VM High Availability Guide
- Change the seconds to achieve the fasted VM recovery, according to OpenShift Virtualization - Fencing and VM High Availability Guide
Update self-node-remediation-automatic-strategy-template
¶
-
Default is "Automatic", but I want a predictable behavor. Offical documentation
Example¶
- Start a RHEL VM with network access
- Provide
~/bin/l
- Run:
l ping $VM_IP
- Run:
oc get pods -o wide --watch | ts '[%Y-%m-%d %H:%M:%S]' | tee -a /tmp/app.log
- Watch the log
tail -f /tmp/app.log
- Stop the VM where the node is running:
l virtctl stop --force --grace-period=0 ocp1-worker-0