Migrating OpenShift 3 Container Runtime

While reaching its end of life, OpenShift 3 remains widely used, and in some cases still more reliable than its successor, OpenShift 4.

OpenShift was historically built on top of Docker, and introduced support for Cri-O, an alternative container runtime. Cri-o integration into OpenShift reached GA with its release 3.9, mid 2018 — based on Kubernetes 1.9 & Cri-o 1.9. Although it has not been without a few hiccups.

As of today, there are still a few bugs involving RPC overflows, when lots of containers are running on a Cri-O nodes, that could result in some operations, addressing all containers, to fail – eg: drains. Or some SDN corruptions, that I suspect to be directly related with Cri-O. Pending RFE to implement SELinux audit logging, similar to what already exists for Docker, … And the fact OpenShift 4 drops Docker support, while ideologically commendable, is quite a bold move right now, considering the youth of Cri-O.

 

Lately, a customer of mine contacted me regarding a cluster, as I did help them to deploy it. Mid 2019, an architect recommended the with OpenShift 3.11, Cri-O, and GlusterFS CNS storage – aka OCS, OpenShift Container Storage. We did set it up, cluster has been running for almost a year now, when customer opened a case with their support, complaining about an issue with GlusterFS containers behaving unexpectedly.

After a few weeks of troubleshooting, support got back to customer, arguing their setup was not supported, pointing us to a KB item none of us was aware of so far: while OpenShift 3.11 is fully supported with both Cri-O and GlusterFS CNS storage, their combination is not: only Docker, may be used with GlusterFS.

When realizing this, we had to come up with a plan, migrating container runtime from Cri-O to Docker, on any OpenShift node hosting GlusterFS, so support would keep investigating the original issue. Lacking any documentation covering such a migration, I’ve been deploying a lab, reproducing my customer’s cluster.

 

We will simplify it to an 11 nodes cluster: 3 masters, 3 gluster, 3 ingress, 2 computes. The GlusterFS nodes would also be hosting Prometheus and Hawkular. The Ingress nodes would host the Docker registry and OpenShift routers. We would also deploy a Git server and a few dummy Pods on the compute nodes, hosting some sources and generating activity on GlusterFS backed persistent volumes.

Having reproduced customer’s setup as close as I could, I would then repeat the following process, re-deploying all my GlusterFS nodes. First, let’s pick a node and drain it:

$ oc adm cordon gluster1.demo
$ oc adm drain gluster1.demo --ignore-daemonsets --delete-local-data

Next, we will connect that node, stop OpenShift services, container runtime, dnsmasq, purge some packages, … It will not clean up everything, though would be good enough for us:

# systemctl stop atomic-openshift-node
# systemctl stop crio
# systemctl stop docker
# systemctl disable atomic-openshift-node
# systemctl disable crio
# systemctl disable docker
# grep BOOTSTRAP_CONFIG /etc/sysconfig/atomic-openshift-node
BOOTSTRAP_CONFIG_NAME=node-cm-name
# cp -f /etc/origin/node/resolv.conf /etc/
# systemctl stop dnsmasq
# systemctl disable dnsmasq
# yum -y remove criu docker atomic-openshift-excluder atomic-openshift-docker-excluder cri-tools \
    atomic-openshift-hyperkube atomic-openshift-node docker-client cri-o atomic-openshift-clients \
    dnsmasq
# rm -fr /etc/origin /etc/dnsmasq.d/* /etc/sysconfig/atomic-openshift-node.rpmsave
# reboot

Once node would have rebooted, we may connect back, confirm DNS resolution still works, that container runtimes are gone, … Then we will delete the node from the API:

$ oc delete node gluster1.demo

Next, we would edit our Ansible inventory, reconfiguring that node to only use Docker. In the inventory file, we would add to that node variables some openshift_use_crio=False, overriding some default defined in our group_vars/OSEv3.yaml.

We would also change the openshift_node_group_name variable, to remove the Cri-o specifics from that node kubelet configuration. Note, in some cases, this could involved editing some custom openshift_node_groups definition. For most common deployments, we may only switch the node group name from a crio variant to its docker equivalent (eg: from node-config-infra-crio to node-config-infra).

Finally, still editing Ansible inventory, we would move our migrating node definition, out of the nodes group, and into the new_nodes one — doing so, if you never had to scale that cluster before, be careful that group should inherit your custom OSEv3 settings, maybe set it as children of the OSEv3 host group, though make sure it’s not a member of the node one. At that stage, it is also recommended to have fixed both OpenShift and GlusterFS versions, up to their patch number — in our case, we’re using OCP 3.11.161, OCS 3.11.4.

Make the the node groups configuration is up to date:

$ oc delete -n openshift-node custom-node-group-gfs1 #not necessary if using default node groups
$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/openshift-master/openshift_node_group.yml

Then, we may proceed as if adding a new node to our cluster:

$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/openshift-node/scaleup.yml

As soon as the node would have joined back our cluster, the GlusterFS container we were missing should start, using the exact same local volumes and configuration, only now it uses Docker.

Once that GlusterFS Pod is marked back healthy, rsh into any GlusterFS container and query for your volumes health:

$ oc rsh -n glusterfs-namespace ds/glusterfs-clustername
sh-4.2# gluster volume list | while read vol; do
gluster volume heal $vol info;
done

Internal healing mechanisms may not fix all issues, be sure your cluster is healthy before migrating another node. Meanwhile, we would edit back Ansible inventory and make sure to move our node, out of the new_nodes group and back into its original location.

Repeat with all node you need to migrate. Eventually, the openshift_use_crio definition could be moved into some host group settings, avoiding multiple definitions in nodes variables.

To further confirm we were not leaving the cluster in some inconsistent state, I’ve later upgraded that lab, to OCP 3.11.200 and OCS 3.11.5, with only one outstanding note: the atomic-openshift-excluder package was missing, on the nodes I did migrate. While it is installed during cluster deployment, it appears this is not the case during cluster scale outs. Could be a bug with openshift-ansible roles or playbooks: in doubt, make sure to install that package manually afterwards.

 

Overall, everything went great. While undocumented, this process is nothing extraordinary.

As of migrating to Docker-backed GlusterFS containers, I did reproduce that issue customer was complaining about. As well as another one, related to GlusterFS arbiter bricks space exhaustion.

Thank science, OCS4 is now based on Rook, and Ceph.