Troubleshooting¶
The openshift-console is not coming up¶
Issue: Router pods scheduled on master.
Solution: Find and delete pods using the below command:
The openshift-console is not coming up¶
Issue: Because of insufficient CPU the pod(s) cannot be scheduled.
Solution: Allocate more physical or virtual CPU or use the Cluster Resource Override Operator to override the ratio between requests and limits set on containers/ pods: 1. Install the Cluster Resource Override Operator 2. Add the below custom resource definition
- Apply the following label to the Namespace object for each project (overrides can be enabled per-project):
The bootstrap is running but the customer can't pull from the mirror registry¶
Issue: The mirror registry's certificate isn't trusted.
Solution: Use curl and openssl to identify the correct certificate/chain of certificates needed to be able to securely connect to the mirror registry and add them to the additionalTrustBundle section inside the install-config.yaml
file with the right intendation, for example.
Ignition fails - connection refused errors during the installation process¶
Problem determination:
Check whether the Kubernetes API (https://api.<cluster-id>.<domain>:port
) is accessible. This helps to verify that the DNS resolution on the bootstrap server is set up correctly.
6443 is the (API) port used by all nodes to communicate with the control plane (master nodes). For reference see Network connectivity requirements.
Run the below debug command from the OpenShift installation directory. This command can be used to follow the installation process.
Issue: The Ignition config files that the openshift-install program generates contain certificates that expire after 24 hours. Expired certificates cause the installation to fail.
Solution: Verify the validity of the certificate being presented by the bootstrap node.
Check that all certificates are valid, especially the certificates from which the ignition files are created. If the openshift-install create ignition-configs command needs to be re-run, then delete all files - including hidden files - except install_config.yaml and openshift-install. Otherwise, the date of the certificates could be pinned to the first run, i.e. the certificates have expired.
Note: It is recommended that you use Ignition config files within 12 hours after they are generated because the 24-hour certificate rotates from 16 to 22 hours after the cluster is installed
For reference, please see * Creating the Kubernetes manifest and Ignition config files * Masters and Workers Fail to Ignite Reporting Error 'x509: certificate has expired or not yet valid'
Ignition fails - connection error "no such host" during the installation process¶
Problem determination: To break down the issue and determine the root cause, ssh into the bootstrap machine and check if the bootstrapping process is progressing. In particular, check for the following root causes:
- Firewall / Proxy settings: Make sure quay.io is reachable from the bootstrap machine and Redhat images can be pulled. In case of vSphere installation, make sure the bootstrap and master machines can reach vCenter API.
-
Bootstrapping progress:
-
Check the
bootkube.service
log for abnormalities with -
Check podman container logs for abnormalities with
Issue: The httpProxy
and httpsProxy
settings might be erroneous, causing bootstrap fail to authenticate at the proxy server and thus cannot reach the internet. Additionally, the firewall could be blocking bootstrap and master nodes from reaching the proxy server.
Solution: Verify the correctness of the proxy settings in the install config yaml:
It is very recommendable to install OCP in a bastion host located inside the same network segment of the installed cluster. By doing this, network issues can be identified timely.
After bootstrapping, openshift-apiserver
and ingress keep crashlooping, while no workers can be provisioned¶
Problem determination: Determine whether the proxy and firewall settings are setup correctly for the master and worker hosts. The following criteria must be met:
- Master nodes can reach vCenter API to provision worker nodes;
- In the installation yaml,
machineNetwork
must correspond to the actual IPs assigned to the nodes, otherwise the proxy settings won't get propagated correctly to the nodes.
Issue: There are two potential issues:
- Master node cannot reach vSphere API to provision worker nodes due to firewall blockage;
- Apiserver and ingress pods' health checks fail, because the
machineNetwork
does not contain the IPs of the machines. Thus the machines are not undernoProxy
and the health checks arrive at the proxy server.
Solution:
Fill out the machineNetwork
correctly in the install config yaml. In case of DHCP, put the entire DHCP range into machineNetwork
or under noProxy
in order to be absolutely sure.
Check out https://docs.openshift.com/container-platform/4.12/networking/enable-cluster-wide-proxy.html for more detailed instructions.
Troubleshooting network issues¶
Move networking resources to the control plane on vsphere
OCP 4 Node not ready after cluster upgrade or node restart
Worker nodes are not visible when running oc get nodes¶
oc get nodes
only shows master nodes.
Issue: The nodes' certificate requests haven't been approved.
Solution:
The new worker node(s) will still be missing or in pending state. Add them by signing the respective client and server CSR requests. Run oc get csr
and then sign each request.
There will be multiple CSRs created per worker, so run the commands above multiple times until the workers show up as ready.
Alternatively, to approve all pending CSRs, run the following command:
After all client and server CSRs have been approved, the machines should have the ready status. Verify this by running the following command:
Installation using OVA template fails¶
Issue: The OVA image has been started prior to cloning.
Solution: Create a new template for the OVA image and then clone the template as needed. Starting the OVA image prior to cloning will kick off the ignition process and, as a result, the ignition of the templates fails.
Troubleshooting ingress issues¶
To check the status of the ingress operator use
Place a nodeSeclector of this deployment on a master node provided that master nodes are running and ready. To verify that masters are unschedulable ensure that the masterSchedulable field is set to false.
Troubleshooting node startup issues¶
To monitor machine-config-operator logs in case any node fails to start:
OpenShift Container Platform 4: How does Machine Config Pool work?
Troubleshooting ICSP related node startup issues¶
To check the content of /etc/containers/registries.conf
on each node use
If /etc/containers/registries.conf
changes, do the nodes purge their internal cache?
NO - If a new container is deployed and if the image requested is not on node the image will be pull from the “mirror” registry mentioned in /etc/containers/registries. This file is just for crio to download the image to the correct location.
Resizing the VM disk¶
https://unix.stackexchange.com/questions/678677/in-an-ubuntu-vm-in-vmware-i-increased-the-hard-disk-space-how-do-i-add-that-to