oc describe pod <podname>
FailedScheduling: 0/7 nodes are available: 3 nodes had taint, that the pod didn’t tolerate (node-role.kubernetes.io/master), 4 insufficient cpu
Issue: Because of insufficient CPU the pod(s) cannot be scheduled.
Solution: Allocate more physical or virtual CPU or use the Cluster Resource Override Operator to override the ratio between requests and limits set on containers/ pods:
1. Install the Cluster Resource Override Operator
2. Add the below custom resource definition
Error pulling candidate abc.def.ghi/company-openshift-docker/openshift-release-dev/ocp-release@sha256:97410a5db655a9d3017b735c2c0747c849d09ff551765e49d5272b80c024a844: initializing source docker://abc.def.ghi/company-openshift-docker/openshift-release-dev/ocp-release@sha256:97410a5db655a9d3017b735c2c0747c849d09ff551765e49d5272b80c024a844: pinging container registry abc.def.ghi: Get "https://abc.def.ghi/v2/ <https://abc.def.ghi/v2/> ": x509: certificate signed by unknown authority
Error: initializing source docker://abc.def.ghi/company-openshift-docker/openshift-release-dev/ocp-release@sha256:97410a5db655a9d3017b735c2c0747c849d09ff551765e49d5272b80c024a844: pinging container registry abc.def.ghi: Get "https://abc.def.ghi/v2/ <https://abc.def.ghi/v2/> ": x509: certificate signed by unknown authority
Issue: The mirror registry's certificate isn't trusted.
Solution: Use curl and openssl to identify the correct certificate/chain of certificates needed to be able to securely connect to the mirror registry and add them to the additionalTrustBundle section inside the install-config.yaml file with the right intendation, for example.
Ignition fails - connection refused errors during the installation process¶
#Checking the bootstrap via journalctl shows below error:
Sep 13 11:58:08 v0004369.abc.def.ghi cluster-bootstrap[46455]: [#602]
failed to fetch discovery: Get "https://localhost:6443/api?timeout=32s":
dial tcp [::1]:6443: connect: connection refused
Sep 13 11:58:08 v0004369.abc.def.ghi bootkube.sh[46444]: [#602] failed to
fetch discovery: Get "https://localhost:6443/api?timeout=32s": dial tcp
[::1]:6443: connect: connection refused
Problem determination:
Check whether the Kubernetes API (https://api.<cluster-id>.<domain>:port) is accessible. This helps to verify that the DNS resolution on the bootstrap server is set up correctly.
6443 is the (API) port used by all nodes to communicate with the control plane (master nodes). For reference see Network connectivity requirements.
$openshift-installwait-forbootstrap-complete--log-leveldebug
#The result output hinted at a certificate issue[openshift@v0004314cluster]$openshift-installwait-forbootstrap-complete
--log-leveldebug
DEBUGOpenShiftInstaller4.11.1
DEBUGBuiltfromcommit1d2450c520b70765b53b71da5e8544657d50d6e2
INFOWaitingupto20m0s(until3:28PM)fortheKubernetesAPIat
https:api.<cluster-id>.<domain>:6443...
DEBUGStillwaitingfortheKubernetesAPI:Get"https:api.<cluster-id>.<domain>:6443":EOF
DEBUGStillwaitingfortheKubernetesAPI:Get"https:api.<cluster-id>.<domain>:6443":x509:certificatehas
expiredorisnotyetvalid:currenttime2022-09-13T15:10:07+02:00is
after2022-09-10T08:45:15Z
DEBUGStillwaitingfortheKubernetesAPI:Get"https://api.<cluster-id>.<domain>:6443":x509:certificatehas
expiredorisnotyetvalid:currenttime2022-09-13T15:10:38+02:00is
after2022-09-10T08:45:15Z
DEBUGStillwaitingfortheKubernetesAPI:Get"https://api.<cluster-id>.<domain>:6443":EOF
Issue: The Ignition config files that the openshift-install program generates contain certificates that expire after 24 hours. Expired certificates cause the installation to fail.
Solution:
Verify the validity of the certificate being presented by the bootstrap node.
Check that all certificates are valid, especially the certificates from which the ignition files are created. If the openshift-install create ignition-configs command needs to be re-run, then delete all files - including hidden files - except install_config.yaml and openshift-install. Otherwise, the date of the certificates could be pinned to the first run, i.e. the certificates have expired.
Note:
It is recommended that you use Ignition config files within 12 hours after they are generated because the 24-hour certificate rotates from 16 to 22 hours after the cluster is installed
Worker nodes are not visible when running oc get nodes¶
oc get nodes only shows master nodes.
Issue: The nodes' certificate requests haven't been approved.
Solution:
The new worker node(s) will still be missing or in pending state. Add them by signing the respective client and server CSR requests. Run oc get csr and then sign each request.
Issue: The OVA image has been started prior to cloning.
Solution:
Create a new template for the OVA image and then clone the template as needed. Starting the OVA image prior to cloning will kick off the ignition process and, as a result, the ignition of the templates fails.
Place a nodeSeclector of this deployment on a master node provided that master nodes are running and ready. To verify that masters are unschedulable ensure that
the masterSchedulable field is set to false.
$ocdebugnode/<workerormasternode>
#chroot /host# less /etc/containers/registries.conf
If /etc/containers/registries.conf changes, do the nodes purge their internal cache?
NO - If a new container is deployed and if the image requested is not on node the image will be pull from the “mirror” registry mentioned in /etc/containers/registries. This file is just for crio to download the image to the correct location.