OpenShift 4 – Baremetal Deployment

Once again, quick post regarding OpenShift, today experimenting with the new installer, and OpenShift 4.

First, let’s remind ourselves that OKD 4 has not yet been released. I would be using my RedHat account credentials pulling images. I usually refuse to touch anything that is not strictly open source (and freely distributed), though I would make an exception here, as I’ve been waiting for OpenShift 4 for almost a year now. Back when my first OpenShift PR got refused, due to their focus being on OpenShift 4, … Now I’m visiting customers for OpenShift 4, I need my own lab to experiment with.

Prepare Hardware

Dealing with a baremetal deployment, we would need to prepare a subnet with its DHCP and PXE servers, a pair of LoadBalancers, and several instances for OpenShift itself.
The following would assume a VLAN was created, we would provide with isc-dhcp-server, tftpd-hpa, bind/nsd and haproxy configuration snippets.

OpenShift nodes would include a bootstrap node (only required during deployment, would be shut down afterwards), three master nodes, and as much worker nodes as we can allocate.
Bootstrap and master nodes should ship with 4 vCPU and 16G RAM at least, while workers could go with 2 vCPU and 8G RAM. Docs mention provisioning those node with at least 120G of disk storage, though this does not seem to be mandatory.
Those nodes would be running on top of KVM hypervisors.

Download Assets

We would start downloading a few assets out of RedHat cloud portal.

We would find links to RedHat CoreOS PXE sources – a kernel, an initramfs, and a pair of compressed filesystems that would be used installing CoreOS to our nodes. We would install those to our PXE server later.

We would also fetch a pull secret, that would allow us downloading images out of RedHat and Quay registries.

Finally, we would retrieve the latest oc client, as well as the openshift-install binaries.


Next, we would prepare DNS records for our OpenShift cluster and nodes.

Contrarily to OpenShift3, we would not be able to use customized names for the cluster API or its applications. 

We would first create a zone for cluster host names,

bootstrap A
master1 A
master2 A
master3 A
infra1 A
infra2 A
infra3 A
compute1 A
compute2 A
compute3 A
compute4 A
compute5 A
haproxy1 A
haproxy2 A

Next, we would create a zone for the cluster itself,

api A
api A
api-int A
api-int A
*.apps A
*.apps A
etcd-0 A
etcd-1 A
etcd-2 A
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380

And corresponding reverse records, in

10 PTR
11 PTR
12 PTR
13 PTR
14 PTR
15 PTR
20 PTR
21 PTR
22 PTR
23 PTR
24 PTR
150 PTR
151 PTR

Don’t forget to reload your zones before going further.


Next, we would configure our DHCP server. First, we would setup static leases for our OpenShift nodes:

host bootstrap-eth0 {
    hardware ethernet 52:54:00:e1:48:6a;
host master0-eth0 {
    hardware ethernet 52:54:00:be:c0:a4;
host master1-eth0 {
    hardware ethernet 52:54:00:79:f3:0f;
host master2-eth0 {
    hardware ethernet 52:54:00:69:74:8c;
host infra1-eth0 {
    hardware ethernet 52:54:00:d3:40:dc;
host infra2-eth0 {
    hardware ethernet 52:54:00:20:f0:af;
host infra3-eth0 {
    hardware ethernet 52:54:00:81:83:25;
host compute1-eth0 {
    hardware ethernet 52:54:00:48:77:48;
host compute2-eth0 {
    hardware ethernet 52:54:00:88:94:94;
host compute3-eth0 {
    hardware ethernet 52:54:00:ff:37:14;
host compute4-eth0 {
    hardware ethernet 52:54:00:c7:46:2d;
host compute5-eth0 {
    hardware ethernet 52:54:00:e1:60:5b;

Next, we would setup a subnet for OpenShift nodes, enabling with PXE booting options:

subnet netmask
    option routers;
    option domain-name “”;
    option domain-name-servers,;
    filename “pxelinux.0”;

Don’t forget to restart your DHCP server.


Now, we would generate some configurations to be served to PXE clients.

First, we would create a configuration file, mandatory for baremetal deployments, install-config.yaml:

apiVersion: v1
– hyperthreading: Enabled
  name: worker
replicas: 0
  hyperthreading: Enabled
  name: master
  replicas: 3
  name: intra
  – cidr:
    hostPrefix: 23
  networkType: OpenShiftSDN
  none: {}
pullSecret: <>
sshKey: ‘ssh-rsa <some-public-key-of-yours>’

If you haven’t already, extract the openshift-install binary from the archive downloaded out of RedHat cloud portal.

mkdir install-directory
cp -p install-config.yaml install-directory/
./openshift-install create manifests –dir=./install-directory
sed -i ‘s|mastersSchedulable:.*|mastersSchedulable: false|’ \
./openshift-install create ignition-configs –dir=./install-directory/
scp -p install-directory/*.ign root@pxe-server:/srv/tftpboot/ocp4/

Note that the install-directory/auth subfolder includes a kubeconfig file, that can be used with the oc and kubectl clients, querying our cluster API, as well as kubeadmin default password logging into the cluster console.


Next, we would configure our PXE server booting RedHat CoreOS nodes.

wget -o /srv/tftpboot/ocp4/kernel \
wget -o /srv/tftpboot/ocp4/initrd \
wget -o /srv/tftpboot/ocp4/metalbios.raw.gz \
cat <<EOF >/srv/tftproot/boot-screens/ocp4.cfg
menu title OCP4 RH-CoreOS Systems
  menu title OCP4 RH-CoreOS Systems
    menu label OCP4 RH-CoreOS Systems
    menu exit
  label –
    menu label 4.2.0 x86_64 – bootstrap
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url= coreos.inst.ignition_url=
  label –
    menu label 4.2.0 x86_64 – master
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url= coreos.inst.ignition_url=
  label –
    menu label 4.2.0 x86_64 – worker
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url= coreos.inst.ignition_url=
menu end

Note that our PXE server also includes its HTTP server, hosting ignition configs and CoreOS installation image URL. In theory, all you need here is an HTTP server, not necessarily related to your PXE server.

Load Balancers

Before we can deploy OpenShift, we would setup its LoadBalancers. Here, we would use HAProxy, with the following configuration:

  maxconn 20000
  log /dev/log local0 info
  chroot /var/lib/haproxy
  pidfile /var/run/
  user haproxy
  group haproxy
  stats socket /var/lib/haproxy/stats

  mode http
  log global
  option httplog
  option dontlognull
  option forwardfor except
  option redispatch
  retries 3
  timeout http-request 10s
  timeout queue 1m
  timeout connect 10s
  timeout client 300s
  timeout server 300s
  timeout http-keep-alive 10s
  timeout check 10s
  maxconn 20000

listen stats
  bind :9000
  mode http
  stats enable
  stats uri /

frontend k8s-api
  bind *:6443
  default_backend k8s-api
  mode tcp
  option tcplog

backend k8s-api
  balance source
  mode tcp
  server bootstrap check
  server master0 check
  server master1 check
  server master2 check

frontend machine-config-server
  bind *:22623
  default_backend machine-config-server
  mode tcp
  option tcplog

backend machine-config-server
  balance source
  mode tcp
  server bootstrap check
  server master0 check
  server master1 check
  server master2 check

frontend apps-tls
  bind *:443
  default_backend apps-tls
  mode tcp
  option tcplog

backend apps-tls
  balance source
  mode tcp
  server router0 check
server router1 check
  server router2 check

frontend apps-clear
  bind *:80
  default_backend apps-clear
  mode tcp
  option tcplog

backend apps-clear
  balance source
  mode tcp
  server router0 check
  server router1 check
  server router2 check

Don’t forget to start and enable HAProxy service.

Boot Instances

Now we should have everything we need. First boot the boostrap node using PXE, wait for it to reboot, then boot the three master nodes in PXE.

We would be able to SSH to each node, as the core user, using the SSH key passed to openshift-install earlier. Keep an eye on system logs.

Meanwhile, we could use openshift-install tracking for OpenShift API bootstrap completion:

./openshift-install –dir=./install-directory wait-for bootstrap-complete \
    log-level info

Eventually, that command would exit, and should confirm our cluster API is now reachable. At that stage, the cluster is not yet done deploying, though we’re getting close.

Next, we would boot our infra nodes in PXE. Keep an eye on certificate signing requests, as we would need to approve those new nodes while joining the cluster:

oc get csr
oc adm certificate sign csr-xxx

Eventually, we should be able to confirm the cluster operators are finishing to deploy.

The only one that would stay in a degraded state would be the image registry operator. Here, we would need to define OpenShift integrated registry storage configuration:

oc edit

To keep it simple, we would stick to an emptyDir storage (volatile), which is not usually recommended.

oc get co
authentication 4.2.0 True False False 1h36m
cloud-credential 4.2.0 True False False 2h
cluster-autoscaler 4.2.0 True False False 1h56m
console 4.2.0 True False False 1h37m
dns 4.2.0 True False False 2h
image-registry 4.2.0 True False False 49m
ingress 4.2.0 True False False 1h42m
insights 4.2.0 True False False 2h
kube-apiserver 4.2.0 True False False 1h59m
kube-controller-manager 4.2.0 True False False 1h58m
kube-scheduler 4.2.0 True False False 1h59m
machine-api 4.2.0 True False False 2h
machine-config 4.2.0 True False False 2h
marketplace 4.2.0 True False False 1h56m
monitoring 4.2.0 True False False 1h40m
network 4.2.0 True False False 2h
node-tuning 4.2.0 True False False 1h56m
openshift-apiserver 4.2.0 True False False 1h57m
openshift-controller-manager 4.2.0 True False False 1h59m
openshift-samples 4.2.0 True False False 1h55m
operator-lifecycle-manager 4.2.0 True False False 2h
operator-lifecycle-manager-catalog 4.2.0 True False False 2h
operator-lifecycle-manager-packageserver 4.2.0 True False False 1h58m
service-ca 4.2.0 True False False 2h
service-catalog-apiserver 4.2.0 True False False 1h56m
service-catalog-controller-manager 4.2.0 True False False 1h57m
storage 4.2.0 True False False 1h56m

Eventually, we may boot workers using PXE, until all nodes joined our cluster. We can also terminate the bootstrap node, that is no longer needed.

LDAP Authentication

Finally, we would setup LDAP authentication. By default, OpenShift4 ships with a single kubeadmin user, that could be used during initial cluster configuration.

oc –config ./kubeconfig create secret generic ldap-secret \
    –from-literal=bindPassword=<secret> -n openshift-config
oc –config ./kubeconfig create configmap ldap-ca \
    –from-file=ca.crt=/path/to/ldap-ca-chain.crt -n openshift-config

Having create a Secret with our OpenShift LDAP service account bind password, and a ConfigMap serving the CA chain, used to sign our OpenLDAP TLS certificate, we would then import the following OAuth configuration:

kind: OAuth
  name: cluster
  – name: LDAP
    mappingMethod: claim
    type: LDAP
        – dn
        – mail
        – sn
        – uid
      bindDN: “cn=openshift,ou=services,dc=example,dc=com”
        name: ldap-secret
        name: ldap-ca
      insecure: false
      url: “ldaps://,dc=example,dc=com?uid?sub?(&(objectClass=inetOrgPerson)(!(pwdAccountLockedTime=*)))”

Having applied that configuration, we would see Pods from the openshift-authentication namespace rebooting. We would then be able to log in using an LDAP account.

OpenShift4 Dashboard

OpenShift4 Dashboard

Infra Nodes

Last detail: after deployment, an OpenShift 4 cluster would include master and worker nodes, while OpenShift 3 used to ship with master, infra and compute nodes.

The worker nodes in OpenShift 4 are meant to replace both infra and computes, which could make sense running smaller setups, though I would argue is not much practical scaling out. Having a small set of nodes, designated to host OpenShift ingress controllers is a good thing, as we only need to configure those IPs as backends for our applications loadbalancers. Say we only rely on worker nodes, every time we add new members to our cluster, we would also need reconfiguring our loadbalancer.

Hence, we would create a group of Infra machines, starting with creating a MachineConfigPool, using the following cofiguration:

kind: MachineConfigPool
  name: infra
    matchLabels: infra
    matchLabels: “”
  paused: false

Having applied that configuration, we would then dump MachineConfig objects applying to worker nodes:

DUMP=$(oc get machineconfig | grep -v rendered | \
  awk ‘/worker/{print $1}’ | tr ‘\n’ ‘ ‘)

oc get machineconfig -o yaml $DUMP >machineconfig-infra.yaml

We would then edit machineconfig-infra.yaml content, removing “generated-by” annotations, creationTimestamps, generation, ownerReferences, resourceVersions, selfLink and uid metadata. Replace any remaning mention of “worker” by “infra”. Then apply the resulting objects:

oc apply -f machineconfig-infra.yaml
oc get mc
00-infra 2.2.0 1m
01-infra-container-runtime 2.2.0 1m
01-infra-kubelet 2.2.0 1m
99-infra-ad9f8790-f270-11e9-a34e-525400e1605b-registries 2.2.0 1m
99-infra-ssh 2.2.0 1m

At that stage, the MachineConfig Operator should be rendering a last MachineConfig object, including an exhaustive list of configurations for our infra nodes. Once oc get mc includes that rendered configuration, we would make sure the MachineConfig Operator is done with our MachineConfigPool and start re-labeling nodes accordingly:

oc get mcp
infra rendered-infra-0506920a222781a19fff88a4196deef4 True False False
master rendered-master-747943425e64364488e51d15e5281265 True False False
worker rendered-worker-5e70256103cc4d0ce0162430de7233a1 True False False
oc label node
node/ labeled
oc label node
node/ labeled

From there, our node would be set unschedulable, drained, and rebooted. Our customized MachineConfig should have changed the role label applied when our node boots, which we may confirm once it is done restarting

oc get nodes Ready worker 47m v1.14.6+c07e432da Ready worker 45m v1.14.6+c07e432da Ready worker 34m v1.14.6+c07e432da Ready worker 33m v1.14.6+c07e432da Ready worker 31m v1.14.6+c07e432da Ready infra 2h v1.14.6+c07e432da Ready worker 2h v1.14.6+c07e432da Ready worker 2h v1.14.6+c07e432da Ready master 2h v1.14.6+c07e432da Ready master 2h v1.14.6+c07e432da Ready master 2h v1.14.6+c07e432da

Once our node is back, we would proceed with the next infra node.

We would eventually reconfigure our Ingress Controller deploying OpenShift Routers back to our infra nodes:

oc edit -n openshift-ingress-operator ingresscontroller default
      matchLabels: “”
  replicas: 3

We would then keep track of routers Pods as they’re being re-deployed:

oc get pods -n openshift-ingress -o wide
router-default-86cdb97784-4d72k 1/1 Running 0 14m
router-default-86cdb97784-8f5vm 1/1 Running 0 14m
router-default-86cdb97784-bvvdc 1/1 Running 0 105s

Ceph RBD Storage

Later on, we may want to configure OpenShift interfacing with an existing Ceph cluster, setting up persisting volumes.

While OpenShift 3 used to ship with rbd binaries in the api controller image, while allowing for their installation on OpenShift nodes, this is no longer the case with OpenShift 4. Instead, we would rely on CSI (Container Storage Interface), which is meant to be a more generic interface.

Then, we would need to deploy Ceph CSI interface to OpenShift,

git clone
oc new-project ceph-csi
for sa in rbd-csi-provisioner rbd-csi-nodeplugin; do
    oc create sa $sa
    oc adm policy add-scc-to-user hostaccess system:serviceaccount:ceph-csi:$sa
    oc adm policy add-scc-to-user privileged system:serviceaccount:ceph-csi:$sa
cat ceph-csi/deploy/rbd/kubernetes/v1.14+/csi-*yaml | sed ‘s|namespace: default|namespace: ceph-csi|g’ | oc apply -n ceph-csi -f-
cat <<EOF >config.json
    “clusterID”: “my-ceph-cluster-id”,
    “monitors: [ “”,”″″ ]
oc delete cm -n ceph-csi ceph-csi-config
oc create cm -n ceph-csi ceph-csi-config –from-file=config.json=./config.json
cat << EOF >secret.yaml
apiVersion: v1
kind: Secret
  name: ceph-rbd-secret
  userID: my-ceph-user-id
  userKey: my-user-key
oc apply -n default -f secret.yaml
cat << EOF >storageclass.yaml
kind: StorageClass
  name: ceph-storage
  clusterID: my-ceph-cluster-id
  pool: kube
  imageFeatures: layering csi-rbd-secret default csi-rbd-secret default xfs
reclaimPolicy: Delete
– discard
oc apply -f storageclass.yaml

At that stage, we would have deployed a DaemonSet of csi-rbdplugin Pods, tasked with attaching and detaching volumes during Pods scheduling and terminations, as well as a Deployment of csi-rbdplugin-provisioner Pods, creating and purging volumes out of Ceph, while managing OpenShift Persistent Volumes.

At that stage, we may create a first Persistent Volume and redeploy OpenShift integrated registry on top of it:

cat <<EOF >registry-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
  name: image-registry
  namespace: openshift-image-registry
  – ReadWriteOnce
      storage: 100Gi
oc apply -f registry-pvc.yaml
oc edit
      claim: image-registry-storage
oc get pods -n openshift-image-registry -w


First thing I would regret is the disappearance of rbd binaries from controllers images. As a result, the Ceph provisioner we used to configure with OpenShift3 no longer works. Apparently, CSI provisioners would be recommended instead, though that implementation is kind of slower, and involves quite a lot of Pods.

After deployment, roughly 12G RAM and 4 CPUs are allocated to cluster operators and OpenShift internals.

Another concern may be that all those operators are privileged actors in our cluster. While we usually had to compromise a node to attack a cluster, now we have a lot of operators that might be accessed through the API, arguably expanding OpenShift attack surface.

The dashboard shows a total CPU capacity of “100%”, which is quite useless.

OpenShift 4.2 is based on Kubernetes 1.14. Among nother novelties, as compared with OpenShift 3, we could mention Istio reaching GA or Tekton pipelines.

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>