Menu

OpenShift 4 – Baremetal Deployment

Once again, quick post regarding OpenShift, today experimenting with the new installer, and OpenShift 4.

First, let’s remind ourselves that OKD 4 has not yet been released. I would be using my RedHat account credentials pulling images. I usually refuse to touch anything that is not strictly open source (and freely distributed), though I would make an exception here, as I’ve been waiting for OpenShift 4 for almost a year now. Back when my first OpenShift PR got refused, due to their focus being on OpenShift 4, … Now I’m visiting customers for OpenShift 4, I need my own lab to experiment with.

Prepare Hardware

Dealing with a baremetal deployment, we would need to prepare a subnet with its DHCP and PXE servers, a pair of LoadBalancers, and several instances for OpenShift itself.
The following would assume a VLAN was created, we would provide with isc-dhcp-server, tftpd-hpa, bind/nsd and haproxy configuration snippets.

OpenShift nodes would include a bootstrap node (only required during deployment, would be shut down afterwards), three master nodes, and as much worker nodes as we can allocate.
Bootstrap and master nodes should ship with 4 vCPU and 16G RAM at least, while workers could go with 2 vCPU and 8G RAM. Docs mention provisioning those node with at least 120G of disk storage, though this does not seem to be mandatory.
Those nodes would be running on top of KVM hypervisors.

Download Assets

We would start downloading a few assets out of RedHat cloud portal.

We would find links to RedHat CoreOS PXE sources – a kernel, an initramfs, and a pair of compressed filesystems that would be used installing CoreOS to our nodes. We would install those to our PXE server later.

We would also fetch a pull secret, that would allow us downloading images out of RedHat and Quay registries.

Finally, we would retrieve the latest oc client, as well as the openshift-install binaries.

DNS

Next, we would prepare DNS records for our OpenShift cluster and nodes.

Contrarily to OpenShift3, we would not be able to use customized names for the cluster API or its applications. 

We would first create a zone for cluster host names, db.nodes.example.com:

$ORIGIN nodes.example.com.
bootstrap A 10.42.253.9
master1 A 10.42.253.10
master2 A 10.42.253.11
master3 A 10.42.253.12
infra1 A 10.42.253.13
infra2 A 10.42.253.14
infra3 A 10.42.253.15
compute1 A 10.42.253.20
compute2 A 10.42.253.21
compute3 A 10.42.253.22
compute4 A 10.42.253.23
compute5 A 10.42.253.24
haproxy1 A 10.42.253.150
haproxy2 A 10.42.253.151

Next, we would create a zone for the cluster itself, db.intra.example.com:

$ORIGIN intra.example.com.
api A 10.42.253.150
api A 10.42.253.151
api-int A 10.42.253.150
api-int A 10.42.253.151
*.apps A 10.42.253.150
*.apps A 10.42.253.151
etcd-0 A 10.42.253.10
etcd-1 A 10.42.253.11
etcd-2 A 10.42.253.12
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-0.nodes.example.com.
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-1.nodes.example.com.
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-2.nodes.example.com.

And corresponding reverse records, in db.253.42.10.in-addr.arpa:

$ORIGIN 253.42.10.in-addr.arpa.
9 PTR bootstrap.nodes.example.com.
10 PTR master1.nodes.example.com.
11 PTR master2.nodes.example.com.
12 PTR master3.nodes.example.com.
13 PTR infra1.nodes.example.com.
14 PTR infra2.nodes.example.com.
15 PTR infra3.nodes.example.com.
20 PTR compute1.nodes.example.com.
21 PTR compute2.nodes.example.com.
22 PTR compute3.nodes.example.com.
23 PTR compute4.nodes.example.com.
24 PTR compute5.nodes.example.com.
150 PTR haproxy1.nodes.example.com.
151 PTR haproxy2.nodes.example.com.

Don’t forget to reload your zones before going further.

DHCP

Next, we would configure our DHCP server. First, we would setup static leases for our OpenShift nodes:

host bootstrap-eth0 {
    hardware ethernet 52:54:00:e1:48:6a;
    fixed-address 10.42.253.9;
}
host master0-eth0 {
    hardware ethernet 52:54:00:be:c0:a4;
    fixed-address 10.42.253.10;
}
host master1-eth0 {
    hardware ethernet 52:54:00:79:f3:0f;
    fixed-address 10.42.253.11;
}
host master2-eth0 {
    hardware ethernet 52:54:00:69:74:8c;
    fixed-address 10.42.253.12;
}
host infra1-eth0 {
    hardware ethernet 52:54:00:d3:40:dc;
    fixed-address 10.42.253.13;
}
host infra2-eth0 {
    hardware ethernet 52:54:00:20:f0:af;
    fixed-address 10.42.253.14;
}
host infra3-eth0 {
    hardware ethernet 52:54:00:81:83:25;
    fixed-address 10.42.253.15;
}
host compute1-eth0 {
    hardware ethernet 52:54:00:48:77:48;
    fixed-address 10.42.253.20;
}
host compute2-eth0 {
    hardware ethernet 52:54:00:88:94:94;
    fixed-address 10.42.253.20;
}
host compute3-eth0 {
    hardware ethernet 52:54:00:ff:37:14;
    fixed-address 10.42.253.20;
}
host compute4-eth0 {
    hardware ethernet 52:54:00:c7:46:2d;
    fixed-address 10.42.253.20;
}
host compute5-eth0 {
    hardware ethernet 52:54:00:e1:60:5b;
    fixed-address 10.42.253.20;
}

Next, we would setup a subnet for OpenShift nodes, enabling with PXE booting options:

subnet 10.42.253.0 netmask 255.255.255.0
{
    option routers 10.42.253.1;
    option domain-name “nodes.example.com intra.example.com”;
    option domain-name-servers 10.42.253.3, 10.42.253.5;
    filename “pxelinux.0”;
    range 10.42.253.9 10.42.253.254;
    next-server 10.42.44.100;
}

Don’t forget to restart your DHCP server.

Ignition

Now, we would generate some configurations to be served to PXE clients.

First, we would create a configuration file, mandatory for baremetal deployments, install-config.yaml:

apiVersion: v1
baseDomain: example.com
compute:
– hyperthreading: Enabled
  name: worker
replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: intra
networking:
  clusterNetwork:
  – cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  – 172.30.0.0/16
platform:
  none: {}
pullSecret: <pull-secret-from-cloud.openshift.com>
sshKey: ‘ssh-rsa <some-public-key-of-yours> admin@example.com’

If you haven’t already, extract the openshift-install binary from the archive downloaded out of RedHat cloud portal.

mkdir install-directory
cp -p install-config.yaml install-directory/
./openshift-install create manifests –dir=./install-directory
sed -i ‘s|mastersSchedulable:.*|mastersSchedulable: false|’ \
    ./install-directory/manifests/cluster-scheduler-02-config.yaml
./openshift-install create ignition-configs –dir=./install-directory/
scp -p install-directory/*.ign root@pxe-server:/srv/tftpboot/ocp4/

Note that the install-directory/auth subfolder includes a kubeconfig file, that can be used with the oc and kubectl clients, querying our cluster API, as well as kubeadmin default password logging into the cluster console.

PXE

Next, we would configure our PXE server booting RedHat CoreOS nodes.

wget -o /srv/tftpboot/ocp4/kernel \
   https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.2/4.2.0/rhcos-4.2.0-x86_64-installer-kernel
wget -o /srv/tftpboot/ocp4/initrd \
   https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.2/4.2.0/rhcos-4.2.0-x86_64-installer-initramfs.img
wget -o /srv/tftpboot/ocp4/metalbios.raw.gz \
  https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.2/4.2.0/rhcos-4.2.0-x86_64-metal-bios.raw.gz
cat <<EOF >/srv/tftproot/boot-screens/ocp4.cfg
menu title OCP4 RH-CoreOS Systems
  menu title OCP4 RH-CoreOS Systems
    menu label OCP4 RH-CoreOS Systems
    menu exit
  label –
    menu label 4.2.0 x86_64 – bootstrap
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url=http://10.42.44.100/ocp4/rhcos-4.2.0-x86_64-metal-bios.raw.gz coreos.inst.ignition_url=http://10.42.44.100/ocp4/bootstrap.ign
  label –
    menu label 4.2.0 x86_64 – master
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url=http://10.42.44.100/ocp4/rhcos-4.2.0-x86_64-metal-bios.raw.gz coreos.inst.ignition_url=http://10.42.44.100/ocp4/master.ign
  label –
    menu label 4.2.0 x86_64 – worker
    kernel installers/ocp4-rhcos-4.2.0/x86_64/linux
    append initrd=installers/ocp4-rhcos-4.2.0/x86_64/initrd-raw ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=vda coreos.inst.image_url=http://10.42.44.100/ocp4/rhcos-4.2.0-x86_64-metal-bios.raw.gz coreos.inst.ignition_url=http://10.42.44.100/ocp4/worker.ign
menu end
EOF

Note that our PXE server also includes its HTTP server, hosting ignition configs and CoreOS installation image URL. In theory, all you need here is an HTTP server, not necessarily related to your PXE server.

Load Balancers

Before we can deploy OpenShift, we would setup its LoadBalancers. Here, we would use HAProxy, with the following configuration:

global
  maxconn 20000
  log /dev/log local0 info
  chroot /var/lib/haproxy
  pidfile /var/run/haproxy.pid
  user haproxy
  group haproxy
  daemon
  stats socket /var/lib/haproxy/stats

defaults
  mode http
  log global
  option httplog
  option dontlognull
  option forwardfor except 127.0.0.0/8
  option redispatch
  retries 3
  timeout http-request 10s
  timeout queue 1m
  timeout connect 10s
  timeout client 300s
  timeout server 300s
  timeout http-keep-alive 10s
  timeout check 10s
  maxconn 20000

listen stats
  bind :9000
  mode http
  stats enable
  stats uri /

frontend k8s-api
  bind *:6443
  default_backend k8s-api
  mode tcp
  option tcplog

backend k8s-api
  balance source
  mode tcp
  server bootstrap 10.42.253.9:6443 check
  server master0 10.42.253.10:6443 check
  server master1 10.42.253.11:6443 check
  server master2 10.42.253.12:6443 check

frontend machine-config-server
  bind *:22623
  default_backend machine-config-server
  mode tcp
  option tcplog

backend machine-config-server
  balance source
  mode tcp
  server bootstrap 10.42.253.9:22623 check
  server master0 10.42.253.10:22623 check
  server master1 10.42.253.11:22623 check
  server master2 10.42.253.12:22623 check

frontend apps-tls
  bind *:443
  default_backend apps-tls
  mode tcp
  option tcplog

backend apps-tls
  balance source
  mode tcp
  server router0 10.42.253.13:443 check
server router1 10.42.253.14:443 check
  server router2 10.42.253.15:443 check

frontend apps-clear
  bind *:80
  default_backend apps-clear
  mode tcp
  option tcplog

backend apps-clear
  balance source
  mode tcp
  server router0 10.42.253.13:80 check
  server router1 10.42.253.14:80 check
  server router2 10.42.253.15:80 check

Don’t forget to start and enable HAProxy service.

Boot Instances

Now we should have everything we need. First boot the boostrap node using PXE, wait for it to reboot, then boot the three master nodes in PXE.

We would be able to SSH to each node, as the core user, using the SSH key passed to openshift-install earlier. Keep an eye on system logs.

Meanwhile, we could use openshift-install tracking for OpenShift API bootstrap completion:

./openshift-install –dir=./install-directory wait-for bootstrap-complete \
    log-level info

Eventually, that command would exit, and should confirm our cluster API is now reachable. At that stage, the cluster is not yet done deploying, though we’re getting close.

Next, we would boot our infra nodes in PXE. Keep an eye on certificate signing requests, as we would need to approve those new nodes while joining the cluster:

oc get csr
oc adm certificate sign csr-xxx

Eventually, we should be able to confirm the cluster operators are finishing to deploy.

The only one that would stay in a degraded state would be the image registry operator. Here, we would need to define OpenShift integrated registry storage configuration:

oc edit configs.imageregistry.operator.openshift.io

To keep it simple, we would stick to an emptyDir storage (volatile), which is not usually recommended.

oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.2.0 True False False 1h36m
cloud-credential 4.2.0 True False False 2h
cluster-autoscaler 4.2.0 True False False 1h56m
console 4.2.0 True False False 1h37m
dns 4.2.0 True False False 2h
image-registry 4.2.0 True False False 49m
ingress 4.2.0 True False False 1h42m
insights 4.2.0 True False False 2h
kube-apiserver 4.2.0 True False False 1h59m
kube-controller-manager 4.2.0 True False False 1h58m
kube-scheduler 4.2.0 True False False 1h59m
machine-api 4.2.0 True False False 2h
machine-config 4.2.0 True False False 2h
marketplace 4.2.0 True False False 1h56m
monitoring 4.2.0 True False False 1h40m
network 4.2.0 True False False 2h
node-tuning 4.2.0 True False False 1h56m
openshift-apiserver 4.2.0 True False False 1h57m
openshift-controller-manager 4.2.0 True False False 1h59m
openshift-samples 4.2.0 True False False 1h55m
operator-lifecycle-manager 4.2.0 True False False 2h
operator-lifecycle-manager-catalog 4.2.0 True False False 2h
operator-lifecycle-manager-packageserver 4.2.0 True False False 1h58m
service-ca 4.2.0 True False False 2h
service-catalog-apiserver 4.2.0 True False False 1h56m
service-catalog-controller-manager 4.2.0 True False False 1h57m
storage 4.2.0 True False False 1h56m

Eventually, we may boot workers using PXE, until all nodes joined our cluster. We can also terminate the bootstrap node, that is no longer needed.

LDAP Authentication

Finally, we would setup LDAP authentication. By default, OpenShift4 ships with a single kubeadmin user, that could be used during initial cluster configuration.

oc –config ./kubeconfig create secret generic ldap-secret \
    –from-literal=bindPassword=<secret> -n openshift-config
oc –config ./kubeconfig create configmap ldap-ca \
    –from-file=ca.crt=/path/to/ldap-ca-chain.crt -n openshift-config

Having create a Secret with our OpenShift LDAP service account bind password, and a ConfigMap serving the CA chain, used to sign our OpenLDAP TLS certificate, we would then import the following OAuth configuration:

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  – name: LDAP
    mappingMethod: claim
    type: LDAP
    ldap:
      attributes:
        id:
        – dn
        email:
        – mail
        name:
        – sn
        preferredUsername:
        – uid
      bindDN: “cn=openshift,ou=services,dc=example,dc=com”
      bindPassword:
        name: ldap-secret
      ca:
        name: ldap-ca
      insecure: false
      url: “ldaps://netserv.vms.example.com/ou=users,dc=example,dc=com?uid?sub?(&(objectClass=inetOrgPerson)(!(pwdAccountLockedTime=*)))”

Having applied that configuration, we would see Pods from the openshift-authentication namespace rebooting. We would then be able to log in using an LDAP account.

OpenShift4 Dashboard

OpenShift4 Dashboard

Infra Nodes

Last detail: after deployment, an OpenShift 4 cluster would include master and worker nodes, while OpenShift 3 used to ship with master, infra and compute nodes.

The worker nodes in OpenShift 4 are meant to replace both infra and computes, which could make sense running smaller setups, though I would argue is not much practical scaling out. Having a small set of nodes, designated to host OpenShift ingress controllers is a good thing, as we only need to configure those IPs as backends for our applications loadbalancers. Say we only rely on worker nodes, every time we add new members to our cluster, we would also need reconfiguring our loadbalancer.

Hence, we would create a group of Infra machines, starting with creating a MachineConfigPool, using the following cofiguration:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: infra
  nodeSelector:
    matchLabels:
    node-role.kubernetes.io/infra: “”
  paused: false

Having applied that configuration, we would then dump MachineConfig objects applying to worker nodes:

DUMP=$(oc get machineconfig | grep -v rendered | \
  awk ‘/worker/{print $1}’ | tr ‘\n’ ‘ ‘)

oc get machineconfig -o yaml $DUMP >machineconfig-infra.yaml

We would then edit machineconfig-infra.yaml content, removing “generated-by” annotations, creationTimestamps, generation, ownerReferences, resourceVersions, selfLink and uid metadata. Replace any remaning mention of “worker” by “infra”. Then apply the resulting objects:

oc apply -f machineconfig-infra.yaml
oc get mc
00-infra 2.2.0 1m
01-infra-container-runtime 2.2.0 1m
01-infra-kubelet 2.2.0 1m
99-infra-ad9f8790-f270-11e9-a34e-525400e1605b-registries 2.2.0 1m
99-infra-ssh 2.2.0 1m

At that stage, the MachineConfig Operator should be rendering a last MachineConfig object, including an exhaustive list of configurations for our infra nodes. Once oc get mc includes that rendered configuration, we would make sure the MachineConfig Operator is done with our MachineConfigPool and start re-labeling nodes accordingly:

oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED
infra rendered-infra-0506920a222781a19fff88a4196deef4 True False False
master rendered-master-747943425e64364488e51d15e5281265 True False False
worker rendered-worker-5e70256103cc4d0ce0162430de7233a1 True False False
oc label node infra1.nodes.example.com node-role.kubernetes.io/infra=
node/infra1.nodes.example.com labeled
oc label node infra1.nodes.example.com node-role.kubernetes.io/worker-
node/infra1.nodes.example.com labeled

From there, our node would be set unschedulable, drained, and rebooted. Our customized MachineConfig should have changed the role label applied when our node boots, which we may confirm once it is done restarting

oc get nodes
compute1.nodes.example.com Ready worker 47m v1.14.6+c07e432da
compute2.nodes.example.com Ready worker 45m v1.14.6+c07e432da
compute3.nodes.example.com Ready worker 34m v1.14.6+c07e432da
compute4.nodes.example.com Ready worker 33m v1.14.6+c07e432da
compute5.nodes.example.com Ready worker 31m v1.14.6+c07e432da
infra1.nodes.example.com Ready infra 2h v1.14.6+c07e432da
infra2.nodes.example.com Ready worker 2h v1.14.6+c07e432da
infra3.nodes.example.com Ready worker 2h v1.14.6+c07e432da
master1.nodes.example.com Ready master 2h v1.14.6+c07e432da
master2.nodes.example.com Ready master 2h v1.14.6+c07e432da
master3.nodes.example.com Ready master 2h v1.14.6+c07e432da

Once our node is back, we would proceed with the next infra node.

We would eventually reconfigure our Ingress Controller deploying OpenShift Routers back to our infra nodes:

oc edit -n openshift-ingress-operator ingresscontroller default
spec:
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/infra: “”
  replicas: 3

We would then keep track of routers Pods as they’re being re-deployed:

oc get pods -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE
router-default-86cdb97784-4d72k 1/1 Running 0 14m 10.42.253.14 infra2.nodes.example.com
router-default-86cdb97784-8f5vm 1/1 Running 0 14m 10.42.253.15 infra3.nodes.example.com
router-default-86cdb97784-bvvdc 1/1 Running 0 105s 10.42.253.13 infra1.nodes.example.com

Ceph RBD Storage

Later on, we may want to configure OpenShift interfacing with an existing Ceph cluster, setting up persisting volumes.

While OpenShift 3 used to ship with rbd binaries in the api controller image, while allowing for their installation on OpenShift nodes, this is no longer the case with OpenShift 4. Instead, we would rely on CSI (Container Storage Interface), which is meant to be a more generic interface.

Then, we would need to deploy Ceph CSI interface to OpenShift,

git clone https://github.com/ceph/ceph-csi/
oc new-project ceph-csi
for sa in rbd-csi-provisioner rbd-csi-nodeplugin; do
    oc create sa $sa
    oc adm policy add-scc-to-user hostaccess system:serviceaccount:ceph-csi:$sa
    oc adm policy add-scc-to-user privileged system:serviceaccount:ceph-csi:$sa
done
cat ceph-csi/deploy/rbd/kubernetes/v1.14+/csi-*yaml | sed ‘s|namespace: default|namespace: ceph-csi|g’ | oc apply -n ceph-csi -f-
cat <<EOF >config.json
[
  {
    “clusterID”: “my-ceph-cluster-id”,
    “monitors: [ “10.1.2.3”,”10.1.2.4″10.1.2.5″ ]
  }
]
EOF
oc delete cm -n ceph-csi ceph-csi-config
oc create cm -n ceph-csi ceph-csi-config –from-file=config.json=./config.json
cat << EOF >secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: ceph-rbd-secret
stringData:
  userID: my-ceph-user-id
  userKey: my-user-key
EOF
oc apply -n default -f secret.yaml
cat << EOF >storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-storage
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: my-ceph-cluster-id
  pool: kube
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-namespace: default
  csi.storage.k8s.io/fstype: xfs
reclaimPolicy: Delete
mountOptions:
– discard
EOF
oc apply -f storageclass.yaml

At that stage, we would have deployed a DaemonSet of csi-rbdplugin Pods, tasked with attaching and detaching volumes during Pods scheduling and terminations, as well as a Deployment of csi-rbdplugin-provisioner Pods, creating and purging volumes out of Ceph, while managing OpenShift Persistent Volumes.

At that stage, we may create a first Persistent Volume and redeploy OpenShift integrated registry on top of it:

cat <<EOF >registry-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: image-registry
  namespace: openshift-image-registry
spec:
  accessModes:
  – ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
EOF
oc apply -f registry-pvc.yaml
oc edit configs.imageregistry.operator.openshift.io
[…]
  storage:
    pvc:
      claim: image-registry-storage
[…]
oc get pods -n openshift-image-registry -w

Conclusion

First thing I would regret is the disappearance of rbd binaries from controllers images. As a result, the Ceph provisioner we used to configure with OpenShift3 no longer works. Apparently, CSI provisioners would be recommended instead, though that implementation is kind of slower, and involves quite a lot of Pods.

After deployment, roughly 12G RAM and 4 CPUs are allocated to cluster operators and OpenShift internals.

Another concern may be that all those operators are privileged actors in our cluster. While we usually had to compromise a node to attack a cluster, now we have a lot of operators that might be accessed through the API, arguably expanding OpenShift attack surface.

The dashboard shows a total CPU capacity of “100%”, which is quite useless.

OpenShift 4.2 is based on Kubernetes 1.14. Among nother novelties, as compared with OpenShift 3, we could mention Istio reaching GA or Tekton pipelines.

Docker Images Vulnerability Scan

While several solutions exist scanning Docker images, I’ve been looking for one that I could deploy and use on OpenShift, integrated into my existing CI chain.

The most obvious answer, working with opensource, would be OpenSCAP. Although I’m still largely working with Debian, while OpenSCAP would only check for CentOS databases.

Another popular contender on the market is Twistlock, but I’m not interested in solutions I can’t deploy myself without requesting for “a demo” or talking to people in general.

Eventually, I ended up deploying Clair, an open source product offered by CoreOS, providing with an API.
It queries popular vulnerabilities databases populating its own SQL database, and can then analyze Docker image layers posted to its API.

We could deploy Clair to OpenShift, alongside its Postgres database, using that Template.

The main issue I’ve had with Clair, so far, was that the client, clairctl, relies on Docker socket access, which is not something you would grant any deployment in OpenShift.
And since I wanted to scan my images as part of Jenkins pipelines, I would have my Jenkins master creating scan agents. Allowing Jenkins creating containers with host filesystem access is, in itself, a security issue, as any user that could create a Job scheduling agents with full access to my OpenShift nodes.

Introducing Klar. A project I found on GitHub, go-based, that can scan images against a Clair service, without any special privileges, besides pulling the Docker image out of your registry, and posting layers to Clair.

We would build a Jenkins agent re-using OpenShift base image, shipping with Klar.

Having build our Jenkins agent image, we can write another BuildConfig, defining a Parameterized Pipeline.

Jenkins CoreOS Clair Scan

Jenkins CoreOS Clair Scan

OpenShift & CephFS

If you’re not yet familiar with it, OpenShift is a container orchestration solution based on Kubernetes. Among others, it integrates with several storage providers such as Ceph.

Although GlusterFS is probably the best choice in terms of OpenShift integration, we could argue Ceph is a better pick overall. And while this post doesn’t aim at offering an exhaustive comparison between the two, we could mention GlusterFS split-brains requiring manual recoveries, poor block devices performances, poor performances dealing with lots (100s) of volumes, the lack of kernel-land client dealing with file volumes, …

The most common way to integrate Ceph with OpenShift is to register a StorageClass, as we could find in OpenShift documentations, managing Rados Block Devices.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: “true”
  name: ceph-storage
parameters:
adminId: kube
  adminSecretName: ceph-secret-kube
  adminSecretNamespace: default
  monitors: 10.42.253.110:6789,10.42.253.111:6789,10.42.253.112:6789
  pool: kube
  userId: kube
  userSecretName: ceph-secret-kube
  userSecretNamespace: default
provisioner: kubernetes.io/rbd
reclaimPolicy: Retain

We would also need to create a Secret, holding our Ceph client key. First, we would create our client, granting it with proper permissions:

$> ceph auth get-or-create client.kube mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=kube’ -o ceph.client.kube.keyring

Next, we would base64-encode our key:

$> awk ‘/^[ \t]*key/{print $3} ceph.client.kube.keyring | base64

And register our Secret, including our encoded secret:

cat <<EOF | oc apply -n default -f-

apiVersion: v1
data:
  key: <base64-encoded-string>
kind: Secret
metadata:
  name: ceph-secret-kube
type: kubernetes.io/rbd

EOF


The previous configurations would then allow us to dynamically provision block devices deploying new applications to OpenShift.

And while block devices is a nice thing to have, dealing with stateful workloads such as databases, up until now, GlusterFS main advantage over Ceph was its ability to provide with ReadWriteMany volumes – that can be mounted from several Pods at once, as opposed to ReadWriteOnce or ReadWriteOnly volumes, that may only be accessed by one deployment, unless mounted as without write capabilities.

On the other hand, in addition to Rados Block Devices, Ceph offers with an optional CephFS share, that is similar to NFS or GlusterFS, in that several clients can concurrently write the same folder. And while CephFS isn’t much mentioned reading through OpenShift documentations, Kubernetes officially supports it. Today, we would try and guess how to make that work with OpenShift.
CephFS is considered to be stable since Ceph 12 (Luminous), released a couple years ago. Since then, I’ve been working for a practical use case. Here it is.

We would mostly rely on the configurations offered in kubernetes-incubator external-storage’s GitHub repository.

First, let’s create a namespace hosting CephFS provisioner:

$> oc new-project cephfs

Then, in that namespace, we would register a Secret. Note that the CephFS provisioner offered by Kubernetes requires with near-admin privileges over your Ceph cluster. For each Persistent Volume registered through OpenShift API, the provisioner would create a dynamic user with limited privileges over the sub-directoriy hosting our data. Here, we would just pass it with our admin key:

apiVersion: v1
kind: Secret
data:
  key: <base64-encoded-admin-key>
metadata:
  name: ceph-secret-admin

Then, we would create a ClusterRole

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cephfs-provisioner
rules:
– apiGroups: [“”]
  resources: [“persistentvolumes”]
  verbs: [“get”, “list”, “watch”, “create”, “delete”]
– apiGroups: [“”]
  resources: [“secrets”]
  verbs: [“create”, “get”, “delete”]
– apiGroups: [“”]
  resources: [“persistentvolumeclaims”]
  verbs: [“get”, “list”, “watch”, “update”]
– apiGroups: [“storage.k8s.io”]
  resources: [“storageclasses”]
  verbs: [“get”, “list”, “watch”]
– apiGroups: [“”]
  resources: [“events”]
  verbs: [“create”, “update”, “patch”]
– apiGroups: [“”]
  resources: [“services”]
resourceNames: [“kube-dns”,”coredns”]
  verbs: [“list”, “get”]

A Role

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cephfs-provisioner
rules:
– apiGroups: [“”]
  resources: [“secrets”]
  verbs: [“create”, “get”, “delete”]
– apiGroups: [“”]
  resources: [“endpoints”]
  verbs: [“get”, “list”, “watch”, “create”, “update”, “patch”]

A ServiceAccount

$> oc create sa cephfs-provisioner

That we would associate with previously-defined ClusterRole and Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cephfs-provisioner
subjects:
– kind: ServiceAccount
  name: cephfs-provisioner
  namespace: cephfs
roleRef:
  kind: ClusterRole
  name: cephfs-provisioner
  apiGroup: rbac.authorization.k8s.io

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cephfs-provisioner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cephfs-provisioner
subjects:
– kind: ServiceAccount
  name: cephfs-provisioner

Next, we would allow our ServiceAccount using the anyuid SecurityContextConstraint:

$> oc adm policy add-scc-to-user anyuid -z cephfs-provisioner

Then, we would create an ImageStream:

$> oc create is cephfs-provisioner

A BuildConfig patching the cephfs-provisioner image, granting write privileges to owning group, such as OpenShift dynamic users may use our shares:

apiVersion: v1
kind: BuildConfig
metadata:
  name: cephfs-provisioner
spec:
  output:
    to:
      kind: ImageStreamTag
name: cephfs-provisioner:latest
  source:
    dockerfile: |
      FROM quay.io/external_storage/cephfs-provisioner:latest

      USER root

      RUN sed -i ‘s|0o755|0o775|g’ /usr/lib/python2.7/site-packages/ceph_volume_client.py
    type: Dockerfile
  strategy:
    type: Docker
  triggers:
  – type: ConfigChange

Next, we would create a StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cephfs
provisioner: ceph.com/cephfs
parameters:
  adminId: admin
  adminSecretName: ceph-secret-admin
  adminSecretNamespace: cephfs
  claimRoot: /kube-volumes
  monitors: 10.42.253.110:6789,10.42.253.111:6789,10.42.253.112:6789

And a DeploymentConfig, deploying the CephFS provisioner:

apiVersion: v1
kind: DeploymentConfig
metadata:
  name: cephfs-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: cephfs-provisioner
    spec:
      containers:
      – args: [ “-id=cephfs-provisioner-1” ]
        command: [ “/usr/local/bin/cephfs-provisioner” ]
        env:
        – name: PROVISIONER_NAME
          value: ceph.com/cephfs
        – name: PROVISIONER_SECRET_NAMESPACE
          value: cephfs
        image: ‘ ‘
        name: cephfs-provisioner
      serviceAccount: cephfs-provisioner
  triggers:
  – imageChangeParams:
      automatic: true
      containerNames: [ cephfs-provisioner ]
      from:
        kind: ImageStreamTag
        name: cephfs-provisioner:latest
    type: ImageChange
  – type: ConfigChange

And we should finally be able to create PersistentVolumeClaims, requesting CephFS-backed storage.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-cephfs
spec:
  accessModes: [ ReadWriteMany ]
  resources:
    requests:
      storage: 1Gi
  storageClassName: cephfs

Having registered the previous object, confirm our volume was properly provisioned:

$> oc get pvc
NAME STATUS VOLUME CAPA ACCESS MODES STORAGECLASS AGE
test-cephfs Bound pvc-xxx 1G RWX cephfs 5h

Then, we would create a Pod mounting that volume:

apiVersion: v1
kind: Pod
metadata:
  name: pvc-test-cephfs
spec:
  containers:
  – image: docker.io/centos/mongodb-34-centos7:latest
    name: cephfs-rwx
    securityContext:
      capabilities:
        drop:
        – KILL
        – MKNOD
        – SETUID
        – SETGID
        privileged: false
    volumeMounts:
    – mountPath: /mnt/cephfs
      name: cephfs
  securityContext:
    seLinuxOptions:
      level: s0:c23,c2
  volumes:
  – name: cephfs
    persistentVolumeClaim:
      claimName: test-cephfs-claim

Once that Pod would have started, we should be able to enter and write our volume:

$ mount | grep cephfs
ceph-fuse on /mnt/cephfs type fuse.ceph-fuse (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
$ date >/mnt/cephfs/toto
$ cat /mnt/cephfs/toto
Wed May 15 19:06:20 UTC 2019

At that point, we should not a non-negligible drawback is the fact the CephFS kernel client doesn’t seem to allow reading from or writing to shares, from OpenShift Pods. Strangely enough, using a shell on the OpenShift node hosting that Pod, I can successfully write files and open them back. A few months ago, this was not the case: today, it would seem OpenShift is the main responsible, and next thing to fix.

Today, as a workaround, you would have to install ceph-fuse on all OpenShift nodes. At which point, any CephFS share would be mounted using ceph.fuse, instead of ceph kernel client.

Bearing in mind that CephFS main concurrent, GlusterFS, also uses a fuse-based client – while not providing with any kernel implementation – we can start infering Gluster is living its last days, as the most popular solution offering file-based storage in OpenShift.

OpenShift Egress Traffic Management

Today, I’m investigating yet another OpenShift feature: Egress Routers. We would look at how to ensure a given connection leaves our cluster using a given IP address, integrating OpenShift with existing services that would be protected by some kind of IP filter.

 

First, let’s explain the default behavior of OpenShift.

Deployments are scheduled on OpenShift hosts, eventually leading to containers being started on those nodes.

Filtering OpenShift compute hosts IPs

Filtering OpenShift compute hosts IPs

Whenever contacting a service outside OpenShift SDN, a container would exit OpenShift network through the node it’s been started on. Meaning the corresponding connection would get NAT-ed using the node IP our container currently is.

As such, a straight forward way of allowing OpenShift containers to reach a protected service could be to trust all my OpenShift hosts IP addresses connecting to those services.

Note that this sample implies trusting all containers that may be scheduled on those OpenShift nodes, contacting our remote service. It could be acceptable in a few cases, although whenever OpenShift is shared among multiple users or tenants, it usually won’t.

While we could address this by dedicating OpenShift nodes to users requiring access to a same set of protected resources, we would remain limited by the amount of OpenShift nodes composing our cluster.

 

Instead of relying on OpenShift nodes addresses, we could involve additional addresses dedicated to accessing external resources.

A first way to implement this would be to allocate an OpenShift namespace with its own Egress IPs:

$ oc patch netnamespace toto -p ‘{“egressIPs”: [“10.42.253.45″,”10.42.253.54”]}’

Such configuration would also require us to associate these egress IPs to OpenShift nodes’ hostsubnet:

$ oc patch hostsubnet compute3 -p ‘{“egressIPs”: [“10.42.253.45”]}’

$ oc patch hostsubnet compute1 -p ‘{“egressIPs”: [“10.42.253.54”]}’

Then, from a Pod of our toto netnamespace, we could try to reach a remote service:

$ oc rsh -n toto jenkins-1-xyz
sh-4.2$ ping 8.8.8.8
[…]
64 bytes from 8.8.8.8: icmp_seq=90 ttl=119 time=4.01 ms
64 bytes from 8.8.8.8: icmp_seq=91 ttl=119 time=4.52 ms
^C
— 8.8.8.8 ping statistics —
91 packets transmitted, 89 received, 2% packet loss, time 90123ms
rtt min/avg/max/mdev = 3.224/4.350/12.042/1.073 ms

Notice we did lost a few packets. The reason for this is that I rebooted the compute3 host, from which my ping was initially leaving the cluster. While the node was marked NotReady, traffic went through the second node, compute1, holding an egressIP associated to the toto netnamespace. From our gateway, we can confirm the new IP is temporarily being used:

# tcpdump -vvni vlan5 host 10.42.253.54
tcpdump: listening on vlan5, link-type EN10MB
13:11:13.066821 10.42.253.54 > 8.8.8.8: icmp: echo request (id:023e seq:3) [icmp cksum ok] (DF) (ttl 63, id 24619, len 84)
13:11:13.070596 arp who-has 10.42.253.54 tell 10.42.253.5
13:11:13.071194 arp reply 10.42.253.54 is-at 52:54:00:b1:15:b9
13:11:13.071225 8.8.8.8 > 10.42.253.54: icmp: echo reply (id:023e seq:3) [icmp cksum ok] [tos 0x4] (ttl 120, id 14757, len 84)
13:11:14.066796 10.42.253.54 > 8.8.8.8: icmp: echo request (id:023e seq:4) [icmp cksum ok] (DF) (ttl 63, id 25114, len 84)
13:11:14.069990 8.8.8.8 > 10.42.253.54: icmp: echo reply (id:023e seq:4) [icmp cksum ok] [tos 0x4] (ttl 120, id 515, len 84)

As soon as compute3 is done rebooting, tcpdump confirms 10.42.253.54 is no longer used.

Namespace-based IP Filtering

Namespace-based IP Filtering

From the router point of view, we can see that the hardware address for our Egress IPs match those of our OpenShift hosts:

# arp -na | grep 10.42.253
[…]
10.42.253.20 52:54:00:b1:15:b9 vlan5 19m49s
10.42.253.21 52:54:00:6b:99:ad vlan5 19m49s
10.42.253.23 52:54:00:23:1c:4f vlan5 19m54s
10.42.253.45 52:54:00:23:1c:4f vlan5 7m36s
10.42.253.54 52:54:00:b1:15:b9 vlan5 10m35s

As such, this configuration may be preferable whenever the network hosting OpenShift would not allow introducing virtual hardware addresses.

Note that usage for a node’s EgressIP is reserved to the netnamespaces specifically requesting them. Any other container executed on my compute1 and compute3 hosts are still being NAT-ed using 10.42.253.20 and 10.42.253.23 respectively.

Also note that namespace based IP filtering does not rely on any placement rule: containers could get started on any OpenShift node in your cluster, their traffic would still exit OpenShift SDN through a designated node, according to netnamespaces and hostsubnets configurations.

Bearing in mind that whenever the EgressIPs from your netnamespaces are no longer assigned to a node from your cluster – be that due to a missing configuration, or an outage affecting all your Egress hosts – then containers from the corresponding projects would no longer have access to resources out of OpenShift SDN.

 

Now that we’re familiar with the basics of OpenShift Egress traffic management, we can focus on Egress Routers.

Several Egress Routers implementation exist, we would focus on the couple most commons that are the Redirect mode, and the HTTP proxy mode.

In both cases, we would use a dedicated project hosting a router Pod:

$ oc new-project egress-routers

We would also rely on a ServiceAccount, that may start privileged containers:

$ oc create sa egress-init

As well as a SecurityContextContraint granting our ServiceAccount such privileges:

$ cat <<EOF >egress-scc.yml
kind: SecurityContextConstraints
apiVersion: v1
metadata: { name: egress-init }
allowPrivilegedContainer: true
runAsUser: { type: RunAsAny }
seLinuxContext: { type: RunAsAny }
fsGroup: { type: RunAsAny }
supplementalGroups: { type: RunAsAny }
users: [ “system:serviceaccount:egress-routers:egress-init” ]
EOF
$ oc create -f egress-scc.yml

Running a Redirect Egress Router, we would then create a controller ensuring a Pod would deal with configuring OpenShift SDN NAT-ing the traffic with a specific Egress IP:

$ cat <<EOF >redirect-router.yml
apiVersion: v1
kind: ReplicationController
metadata:
  name: egress-router
spec:
  replicas: 1
  selector:
    name: egress-router
  template:
    metadata:
      name: egress-router
      labels:
        name: egress-router
      annotations:
        pod.network.openshift.io/assign-macvlan: “true”
    spec:
      initContainers:
      – name: egress-demo-init
        image: docker.io/openshift/origin-egress-router
        env:
        – name: EGRESS_SOURCE
          value: 10.42.253.46
        – name: EGRESS_GATEWAY
          value: 10.42.253.1
        – name: EGRESS_DESTINATION
          value: 8.8.8.8
        – name: EGRESS_ROUTER_MODE
          value: init
        securityContext:
          privileged: true
        serviceAccountName: egress-init
      containers:
      – name: egress-demo-wait
        image: docker.io/openshift/origin-pod
      nodeSelector:
        node-role.kubernetes.io/infra: “true”
      serviceAccountName: egress-init
EOF
$ oc create -f redirect-router.yml

Note we are first starting an init container setting up proper iptalbes rules using a few variables. EGRESS_SOURCE is an arbitrary and un-allocated IP address in OpenShift subnet, EGRESS_GATEWAY is our default gateway and EGRESS_DESTINATION the remote address our Egress router would forward its traffic to.

Once our init container is done updating iptables configuration, it is shut down and replaced by our main pod, that would not do anything.

At that point, we could enter that Pod, and see that all its traffic exit OpenShift subnet being NAT-ed with our EGRESS_SOURCE IP, by the OpenShift host executing our container.

From the network gateway point of view, we could notice our EGRESS_SOURCE_IP address is associated to a virtual hardware address:

# arp -na | grep 10.42.253
[…]
10.42.253.46 d6:76:cc:f4:e3:d9 vlan5 19m34s

Contrarily to namespace-scoped Egress IPs, Egress Routers may be scheduled anywhere on OpenShift cluster, according to an arbitrary  – and optional – placement rule. Although it relies on containers, which might take a few seconds to start depending on images being available in Docker local caches. Another drawback being that a single IP can not be shared among several routers, we would not be able to scale them.

To offer with redundancy, we could however setup several Egress Routers per protected service, using distinct EGRESS_SOURCE, and sharing the same EGRESS_DESTINATION.

While we’ve seen our Egress Router container exits our cluster to any remotes using our designated EGRESS_SOURCE address, let’s now look at how to use that router from other OpenShift hosted containers. First, we would create a service identifying our Egress Routers:

$ oc create service –name egress-redirect –namespace egress-routers –port=53 –selector=name=egress-router

Depending on your network plugin we would have to allow traffic coming to that service from third-party Projects. We would then be able to query our EGRESS_DESTINATION through our service:

$ curl http://egress-redirect.egress-routers.svc:53/

From our gateway, we could see the corresponding traffic leaving OpenShift SDN, NAT-ed using our EGRESS_SOURCE:

# tcpdump -vvni vlan5 host 8.8.8.8
tcpdump: listening on vlan5, link-type EN10MB
11:11:53.357775 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84645569 0,nop,wscale 7> (DF)
11:11:54.357948 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84646572 0,nop,wscale 7> (DF)
11:11:56.361964 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84648576 0,nop,wscale 7> (DF)

Redirect Egress Routers

Redirect Egress Routers

Note that the EGRESS_DESTINATION definition may include more than a single address, depending on the protocol and port queried, we could route those connections to distinct remotes:

env:
– name: EGRESS_DESTINATION
  value: |
    80 tcp 203.0.113.25
    8080 tcp 203.0.113.26 80
    8443 tcp 203.0.113.26 443
    203.0.113.27

That snippet would ensure that connections to our router pod on TCP port 80 would be sent to a first remote address, while those to 8080 and 8443 are translated to ports 80 and 443 respectively of a second address, and any other traffic sent to a third remote address.

We could very well set these into a ConfigMap, to eventually include from our Pods configuration, ensuring consistency among a set of routers.

Obviously from OpenShift containers point of view, instead of connecting to our remote service, we would have to reach our Egress Router Service, which would in turn ensure proper forwarding of our requests.

 

Note that Redirect Egress Routers are limited to TCP and UDP traffic, while usually not recommended dealing with HTTP communications. That later case is best suited for the HTTP Proxy Egress Routers, relying on Squid.

Although very similar to Redirect Egress Routers, the HTTP Proxy would not set an EGRESS_DESTINATION environment variable on its init containers, and would instead pass an EGRESS_HTTP_PROXY_DESTINATION to the main container, such as:

$ cat <<EOF >egress-http.yml
apiVersion: v1
kind: ReplicationController
metadata:
  name: egress-http
spec:
  replicas: 1
  selector:
    name: egress-http
  template:
    metadata:
      name: egress-http
      labels:
        name: egress-http
      annotations:
        pod.network.openshift.io/assign-macvlan: “true”
    spec:
      initContainers:
      – name: egress-demo-init
        image: openshift/origin-egress-router
        env:
        – name: EGRESS_SOURCE
          value: 10.42.253.43
        – name: EGRESS_GATEWAY
          value: 10.42.253.1
        – name: EGRESS_ROUTER_MODE
          value: http-proxy
        securityContext:
          privileged: true
        serviceAccountName: egress-init
      containers:
      – name: egress-demo-proxy
        env:
        – name: EGRESS_HTTP_PROXY_DESTINATION
          value: |
            !perdu.com
            !*.perdu.com
            !10.42.253.0/24
            *
        image: openshift/origin-egress-http-proxy
      nodeSelector:
        node-role.kubernetes.io/infra: “true”
      serviceAccountName: egress-init
EOF
$ oc create -f egress-http.yml

Note the EGRESS_HTTP_PROXY_DESTINATION definition allows us to deny access to specific resources, such as perdu.com and its subdomain or an arbitrary private subnet, while we would allow any other communication with a wildcard.

By default, the Egress HTTP Proxy image listens on TCP port 8080, which allows us to create a service such as the following:

$ oc create service –name egress-http –namespace egress-routers –port=8080 –selector=egress-http

And eventually use that service from other OpenShift containers, based on environment variable proper definition:

$ oc rsh -n too jenkins-1-xyz
sh-4.2$ $ https_proxy=http://egress-http.egress-routers.svc:8080 http_proxy=http://egress-http.egress-routers.svc:8080/ curl -vfsL http://free.fr -o /dev/null
[… 200 OK …]
sh-4.2$ https_proxy=http://egress-http.egress-routers.svc:8080 http_proxy=http://egress-http.egress-routers.svc:8080/ curl -vfsL http://perdu.com -o /dev/null
[… 403 forbidden …]

As for our Redirect Egress Router, running tcpdump on our gateway would confirm traffic is properly NAT-ed:

# tcpdump -vvni vlan5 host 10.42.253.43
[…]
12:11:37.385219 212.27.48.10.443 > 10.42.253.43.55906: . [bad tcp cksum b96! -> 9b9f] 3563:3563(0) ack 446 win 30016 [tos 0x4] (ttl 63, id 1503, len 40)
12:11:37.385332 212.27.48.10.443 > 10.42.253.43.55906: F [bad tcp cksum b96! -> 9ba0] 3562:3562(0) ack 445 win 30016 [tos 0x4] (ttl 63, id 40993, len 40)
12:11:37.385608 10.42.253.43.55908 > 212.27.48.10.443: . [tcp sum ok] 472:472(0) ack 59942 win 64800 (DF) (ttl 64, id 1694, len 40)
12:11:37.385612 10.42.253.43.55906 > 212.27.48.10.443: . [tcp sum ok] 446:446(0) ack 3563 win 40320 (DF) (ttl 64, id 1695, len 40)

While our router ARP table would show records similar to Redirect Egress Router ones:

# arp -na | grep 10.42.253.43
10.42.253.43 d2:87:15:45:1c:28 vlan5 18m52s

Depending on security requirements and the kind of service we want to query, OpenShift is pretty flexible. Although the above configurations do not represent an exhaustive view of existing implementations, we did cover the most basic use cases from OpenShift documentations, which are more likely to remain supported.

Whenever possible, using namespace-scoped IPs seems to be easier, as it would not rely on any other service than OpenShift SDN applying proper routing and NAT-ing. Try to offer with several IPs per namespaces, allowing for quick failover, should a node become unavailable.

If port-based filtering is required, then Redirect Routers are more likely to satisfy, although deploying at least two Pods, using two distinct Egress IPs and node selectors would be recommended, as well as sharing a ConfigMap defining outbound routing.

Similarly, HTTP Proxy Routers would be recommended proxying HTTP traffic, as it would not require anything else than setting a few environment variables and ensuring our runtime does observe environment-based proxy configuration.

 

Packages Build Pipeline with OpenShift

As an other follow-up to my previous OpenShift posts, today we would look into Jenkins and Nexus integration with OpenShift, while building a dummy package shipping SSH Keys, both as a debien archive and RPM package.

If you’re not concerned with automating Nexus configuration, then you may use sonatype/nexus3 from the Docker hub setting up Nexus Repository Manager on OpenShift.
As I wanted to automate a few configuration tasks, I eventually started working on my own image, forking from a repository offered by Accenture. My copy isn’t yet released publicly, so I’ld just point out it creates a couple users uploading and downloading Artifacts.

Another subject to address would be to prepare a couple images building our Debian and RPM packages. Regarding RPMs, we could divert from Jenkins base slave image:

FROM openshift/jenkins-slave-base-centos7

RUN yum -y install epel-release \
    && yum -y install @development-tools centos-packager rpmdevtools \
    && yum -y install make wget git curl

USER 1001

While for Debian we would want to build some Stretch-based equivalent:

FROM debian:stretch

ENV HOME=/home/jenkins \
    DEBIAN_FRONTEND=noninteractive

USER root

ADD config/* /usr/local/bin/

RUN apt-get -y update \
    && apt-get -y install bc gettext git subversion openjdk-8-jre-headless gnupg curl wget \
                lsof rsync tar unzip debianutils zip bzip2 make gcc g++ devscripts debhelper \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && mkdir -p /home/jenkins \
    && chown -R 1001:0 /home/jenkins \
    && chmod -R g+w /home/jenkins \
    && chmod 664 /etc/passwd \
    && chmod -R 775 /etc/alternatives /usr/lib/jvm \
    && chmod 775 /usr/bin /usr/share/man/man1

USER 1001

ENTRYPOINT [“/usr/local/bin/run-jnlp-client”]

From there, the last item we’ll need, building our packages, is their sources.

Building RPMs, we would write a Spec file such as the following:

Summary: My Package
Name: my-package
Version: 0.0.1
Release: 1%{?dist}
License: MIT
Source: https://repo/sources/el-%{name}-%{version}.tar.gz
URL: https://my.example.com

Autoreq: no
BuildRequires: git
BuildRequires: make

%description
Does something awesome

%global __os_install_post %{nil}
%define debug_package %{nil}
%prep
%autosetup
%build
%install
make install PREFIX=%{buildroot}

%pre
%prerun
%post
%files
%defattr(-,root,root)
%dir %{_datadir}/mydir
%{_datadir}/mydir/myfile

%changelog
 * Thu Aug 30 2018 It’s Me <mario@example.com> 0.0.1-1
 – Initial release – In an other castle?

Now regarding Debian packages, we would need to create a couple subdirectories, configuration files and scripts:

$ mkdir -p debian/source
$ echo “3.0 (quit)” >debian/source/format
$ echo 9 >debian/compat
$ for i in postinst preinst prerm pstrm; do
cat <<EOF >debian/$i
#!/bin/sh
# $i script for my-package

set -e

case “$1” in
  purge|remove|abort-install|disappear) ;;

  upgrade|failed-upgrade|abort-upgrade) ;;

  *)
    echo “postrm called with unknown argument \`$1′” >&2
    exit 1
    ;;
esac

#DEBHELPER#

exit 0
EOF
chmod +x debian/$i
done
$ for i in docs copyright missing-sources README.Debian; do
touch $i
done
$ cat <<EOF >debian/rules
#!/usr/bin/make -f
#DH_VERBOSE = 1

DPKG_EXPORT_BUILDFLAGS = 1
include /usr/share/dpkg/default.mk

# see FEATURE AREAS in dpkg-buildflags(1)
export DEB_BUILD_MAINT_OPTIONS = hardening=+all

# main packaging script based on dh7 syntax
%:
        dh $@

override_dh_auto_install:
        $(MAKE) install PREFIX=$(CURDIR)/debian/my-package

override_dh_auto_build:
        echo nothing to do

override_dh_auto_test:
        echo nothing to do
EOF
$ chmod +x debian/rules
$ cat <<EOF >debian/changelog
my-package (0.0.1-1) unstable; urgency=low

  * Initial release – In an other castle?

— It’s Me <mario@example.com> Thu, 30 Aug 2018 11:30:42 +0200
EOF

From there, we ensure our sources ships with a Makefile, providing with the following rules:

SHARE_DIR = $(PREFIX)/usr/share

createdebsource:
    LANG=C debuild -S -sa

createdebbin:
    LANG=C dpkg-buildpackage -us -uc

createrpm:
    versionNumber=`awk ‘/^Version:/{print $$2;exit;}’ el/my-package.spec`; \
    wdir=”`pwd`/..”; \
    buildroot=”$$wdir/rpmbuild”; \
    for d in SOURCES SPECS BUILD RPMS SRPMS; \
    do \
      mkdir -p “$$buildroot/$$d”; \
    done; \
    cp -p “$$wdir/el-my-package-$$versionNumber.tar.gz” “$$buildroot/SOURCES/”; \
    cp -p “$$wdir/my-package/el/my-package.spec” “$$buildroot/SPECS/”; \
    if ! whoami >/dev/null 2>&1; then \
      chown -R root:root “$$buildroot/SOURCES” “$$buildroot/SPECS”; \
    elif whoami 2>/dev/null | grep default >/dev/null; then \
      chown -R :root “$$buildroot/SOURCES” “$$buildroot/SPECS”; \
    fi; \
    ( \
      cd “$$buildroot”; \
      LANG=C rpmbuild –define “_topdir $$buildroot” -ba SPECS/my-package.spec && \
      find *RPMS -type f | while read output; \
        do \
          mv “$$output” $$wdir/; \
        done; \
    )

createinitialarchive:
    rm -fr .git .gitignore README.md
    versionNumber=`cat debian/changelog | awk ‘/my-package/{print $$2;exit}’ | sed -e ‘s|[()]||g’ -e ‘s|\\(.*\\)-[0-9]*\$$|\\1|’`; \
    ( \
      cd ..; \
      tar -czf my-package_$$versionNumber.orig.tar.gz my-package; \
      mv my-package my-package-$$versionNumber; \
      tar -czf el-my-package-$$versionNumber.tar.gz my-package-$$versionNumber; \
      mv my-package-$$versionNumber my-package; \
    )

install:
    mkdir -p $(SHARE_DIR)/mydir
    install -c -m 0644 myfile $(SHARE_DIR)/mydir/myfile

At which point, we may use the following OpenShift Template, creating a few secrets and a pair of Jenkins Pipelines, building Debian and RPM packages based on our previous images, then uploading their Artifacts to Nexus :

apiVersion: v1
kind: Template
metadata:
  name: my-package-template
objects:
– apiVersion: v1
  kind: Secret
  metadata:
    annotations:
      jenkins.io/credentials-description : ${APPLICATION_NAME} Git Token credential from Kubernetes
    labels:
      jenkins.io/credentials-type: secretText
    name: git-${APPLICATION_NAME}
  stringData:
    text: ${GIT_DEPLOYMENT_TOKEN}
– apiVersion: v1
  kind: Secret
  metadata:
    annotations:
      jenkins.io/credentials-description : ${APPLICATION_NAME} Nexus Credentials from Kubernetes
    labels:
      jenkins.io/credentials-type: usernamePassword
    name: nexus-${APPLICATION_NAME}
  stringData:
  password: ${NEXUS_ARTIFACTS_PASSWORD}
  username: ${NEXUS_ARTIFACTS_USERNAME}
– apiVersion: v1
  kind: BuildConfig
  metadata:
    annotations:
      description: Builds ${APPLICATION_NAME} rpm archive
    name: ${APPLICATION_NAME}-rpm
  spec:
    strategy:
      jenkinsPipelineStrategy:
        jenkinsfile: |-
          try {
            def pkgname = “${APPLICATION_NAME}”
            def label = “${pkgname}-${UUID.randomUUID().toString()}”
            podTemplate(label: label, name: label, cloud: ‘openshift’,
                containers: [ containerTemplate(name: ‘jnlp’, image: ‘${DOCKER_REGISTRY}/${CENTOS_IMAGE}’) ],
                inheritFrom: ‘nodejs’, serviceAccount: ‘jenkins’) {
              timeout(time: 40, unit: ‘MINUTES’) {
                node (label) {
                  stage(“Fetch”) {
                    sh “git config –global http.sslVerify false”
                    sh “mkdir ${pkgname}”
                    withCredentials([string(credentialsId: “git-${pkgname}”, variable: ‘GIT_TOKEN’)]) {
                      sh “echo ‘${SOURCE_REPOSITORY_URL}’ | sed ‘s|^\\(http[s]*://\\)\\(.*\\)|\\1${GIT_TOKEN}@\\2|’ >cloneFrom 2>/dev/null”
                    }
                    def cloneAddress = readFile(‘cloneFrom’).trim()
                    dir (“${pkgname}”) {
                      git([ branch: “master”, changelog: false, poll: false, url: cloneAddress ])
                    }
                  }
                  stage(“Build”) {
                    sh “””
                    ( cd ${pkgname} ; git rev-parse –short HEAD ) >gitHash
                    ( cd ${pkgname} ; make createinitialarchive ; make createrpm )
                    awk ‘/^Release:/{print \$2;exit;}’ ${pkgname}/el/${pkgname}.spec | cut -d% -f1 >patchNumber
                    awk ‘/^Version:/{print \$2;exit;}’ ${pkgname}/el/${pkgname}.spec >versionNumber
                    “””
                  }
                  stage(“Upload”) {
                    def gitHash = readFile(‘gitHash’).trim()
                    def patch = readFile(‘patchNumber’).trim()
                    def version = readFile(‘versionNumber’).trim()
                    sh “echo Uploading artifacts for ${version}-${patch}-${gitHash}”
                    nexusArtifactUploader(
                      nexusVersion: ‘${NEXUS_VERSION}’,
                      protocol: “${NEXUS_PROTO}”,
                      nexusUrl: “${NEXUS_REMOTE}”,
                      groupId: “${NEXUS_GROUP_ID}”,
                      version: “${version}-${patch}-${gitHash}”,
                      repository: “${NEXUS_RPM_REPOSITORY}”,
                      credentialsId: “nexus-${pkgname}”,
                      artifacts: [
                      [ artifactId: “${pkgname}-rpm”,
                      classifier: ”, type: ‘rpm’,
                      file: “${pkgname}-${version}-${patch}.el7.src.rpm” ],
                      [ artifactId: “${pkgname}-rpm”,
                      classifier: ”, type: ‘rpm’,
                      file: “${pkgname}-${version}-${patch}.el7.x86_64.rpm” ],
                      [ artifactId: “${pkgname}-rpm”,
                      classifier: ”, type: ‘tar.gz’,
                      file: “el-${pkgname}-${version}.tar.gz” ]
                      ]
                    )
                  }
                }
              }
            }
          } catch (err) {
            echo “in catch block”
            echo “Caught: ${err}”
            currentBuild.result = ‘FAILURE’
            throw err
          }
      type: JenkinsPipeline
– apiVersion: v1
  kind: BuildConfig
  metadata:
    annotations:
      description: Builds ${APPLICATION_NAME} deb archive
    name: ${APPLICATION_NAME}-deb
  spec:
    strategy:
      jenkinsPipelineStrategy:
        jenkinsfile: |-
          try {
            def pkgname = “${APPLICATION_NAME}”
            def label = “${pkgname}-${UUID.randomUUID().toString()}”
            podTemplate(label: label, name: label, cloud: ‘openshift’,
                containers: [ containerTemplate(name: ‘jnlp’, image: ‘${DOCKER_REGISTRY}/${DEBIAN_IMAGE}’) ],
                inheritFrom: ‘nodejs’, serviceAccount: ‘jenkins’) {
              timeout(time: 40, unit: ‘MINUTES’) {
                node (label) {
                  stage(“Fetch”) {
                    sh “git config –global http.sslVerify false”
                    sh “mkdir ${pkgname}”
                    withCredentials([string(credentialsId: “git-${pkgname}”, variable: ‘GIT_TOKEN’)]) {
                      sh “echo ‘${SOURCE_REPOSITORY_URL}’ | sed ‘s|^\\(http[s]*://\\)\\(.*\\)|\\1${GIT_TOKEN}@\\2|’ >cloneFrom 2>/dev/null”
                    }
                    def cloneAddress = readFile(‘cloneFrom’).trim()
                    dir (“${pkgname}”) {
                      git([ branch: “master”, changelog: false, poll: false, url: cloneAddress ])
                    }
                  }
                  stage(“Build”) {
                    sh “””
                    ( cd ${pkgname} ; git rev-parse –short HEAD ) >gitHash
                    ( cd ${pkgname} ; make createinitialarchive ; make createdebbin )
                    cat ${pkgname}/debian/changelog | awk ‘/${pkgname}/{print \$2;exit}’ | sed -e ‘s|[()]||g’ -e ‘s|.*-\\([0-9]*\\)\$|\\1|’ >patchNumber
                    cat ${pkgname}/debian/changelog | awk ‘/${pkgname}/{print \$2;exit}’ | sed -e ‘s|[()]||g’ -e ‘s|\\(.*\\)-[0-9]*\$|\\1|’ >versionNumber
                    “””
                  }
                  stage(“Upload”) {
                    def gitHash = readFile(‘gitHash’).trim()
                    def patch = readFile(‘patchNumber’).trim()
                    def version = readFile(‘versionNumber’).trim()
                    sh “echo Uploading artifacts for ${version}-${patch}-${gitHash}”
                    nexusArtifactUploader(
                      nexusVersion: ‘${NEXUS_VERSION}’,
                      protocol: “${NEXUS_PROTO}”,
                      nexusUrl: “${NEXUS_REMOTE}”,
                      groupId: “${NEXUS_GROUP_ID}”,
                      version: “${version}-${patch}-${gitHash}”,
                      repository: “${NEXUS_DEB_REPOSITORY}”,
                      credentialsId: “nexus-${pkgname}”,
                      artifacts: [
                      [ artifactId: “${pkgname}-deb”,
                      classifier: ”, type: ‘deb’,
                      file: “${pkgname}_${version}-${patch}_all.deb” ],
                      [ artifactId: “${pkgname}-deb”,
                      classifier: ”, type: ‘txt’,
                      file: “${pkgname}_${version}-${patch}_amd64.buildinfo” ],
                      [ artifactId: “${pkgname}-deb”,
                      classifier: ”, type: ‘txt’,
                      file: “${pkgname}_${version}-${patch}_amd64.changes” ],
                      [ artifactId: “${pkgname}-deb”,
                      classifier: ”, type: ‘tar.xz’,
                      file: “${pkgname}_${version}-${patch}.debian.tar.xz” ],
                      [ artifactId: “${pkgname}-dev”,
                      classifier: ”, type: ‘tar.gz’,
                      file: “${pkgname}_${version}.orig.tar.gz” ],
                      [ artifactId: “${pkgname}-deb”,
                      classifier: ”, type: ‘txt’,
                      file: “${pkgname}_${version}-${patch}.dsc” ]
                      ]
                    )
                  }
                }
              }
            }
          } catch (err) {
            echo “in catch block”
            echo “Caught: ${err}”
            currentBuild.result = ‘FAILURE’
            throw err
          }
      type: JenkinsPipeline
parameters:
– name: APPLICATION_NAME
  description: Package Name – should match that expected by package we’ll build
  displayName: Package Name
  value: my-package
– name: DEBIAN_IMAGE
  description: Jenkins Debian Agent Image – relative to DOCKER_REGISTRY
  displayName: Jenkins Debian Agent Image
  required: true
  value: “cicd/jenkins-agent-debian:latest”
– name: DOCKER_REGISTRY
  description: Docker Registry
  displayName: Docker Registry
  required: true
  value: docker-registry.default.svc:5000
– name: CENTOS_IMAGE
  description: Jenkins Centos Agent Image – relative to DOCKER_REGISTRY
  displayName: Jenkins Centos Agent Image
  required: true
  value: “cicd/jenkins-agent-centos:latest”
– name: GIT_DEPLOYMENT_TOKEN
  description: Git deployment token
  displayName: Git Deployment Token
  required: true
– name: NEXUS_ARTIFACTS_PASSWORD
  description: Nexus Artifacts Upload Password
  displayName: Nexus Artifacts Upload Password
  required: true
  value: admin123
– name: NEXUS_ARTIFACTS_USERNAME
  description: Nexus Artifacts Upload Username
  displayName: Nexus Artifacts Upload Username
  required: true
  value: admin
– name: NEXUS_GROUP_ID
  description: Nexus Group ID
  displayName: Nexus Group ID
  required: true
  value: com.example
– name: NEXUS_DEB_REPOSITORY
  description: Nexus Artifact Debian Repository – remote repository name
  displayName: Nexus Artifact Debian Repository
  required: true
  value: debian
– name: NEXUS_PROTO
  description: Nexus Proto – http or https
  displayName: Nexus Proto
  required: true
  value: http
– name: NEXUS_REMOTE
  description: Nexus Remote URL – proto-less URI connecting to Nexus
  displayName: Nexus Remote URL
  value: “nexus:8081”
  required: true
– name: NEXUS_RPM_REPOSITORY
  description: Nexus Artifact EL Repository – remote repository name
  displayName: Nexus Artifact EL Repository
  required: true
  value: centos
– name: NEXUS_VERSION
  description: Nexus Repository Version
  displayName: Nexus Repository Version
  required: true
  value: nexus3
– name: SOURCE_REPOSITORY_URL
  description: The URL of the repository with your application source code
  displayName: Git Repository URL
  required: true
  value: https://git.example.com/project/my-package

Signing and Scanning Docker Images with OpenShift

You may already know Docker images can be signed. Today we would discuss a way to automate images signature, using OpenShift.

Lately, I stumbled upon a bunch of interesting repositories:

  • https://github.com/redhat-cop/openshift-image-signing-scanning: ansible playbook configuring an OCP cluster, building a base image, setting up a service account and installing a few templates providing with docker images scanning and signing
  • https://github.com/redhat-cop/image-scanning-signing-service: an optional OpenShift third-party service implementing support for ImageSigningRequest and ImageScanningRequest objects
  • https://github.com/redhat-cop/openshift-event-controller: sources building an event controller that would watch for new images pushed to OpenShift docker registry
  • Although these are amazing, I could not deploy them to my OpenShift Origin, due to missing subscriptions and packages.

    image signing environment overview

    image signing environment overview

    In an effort to introduce CentOS support, I forked the first repository from our previous list, and started rewriting what I needed:

    https://github.com/faust64/openshift-image-signing-scanning

     

    A typical deployment would involve:

  • Generating a GPG keypair on some server (not necessarily related to OpenShift)
  • Depending on your usecase, we could then want to configure docker to prevent unsigned images from being run on our main OpenShift hosts
  • Next, we would setup labels and taints identifying the nodes we trust signing images, as well as apply and install a few templates and a base image
  • At which point, you could either install the event-controller Deployment to watch for all your OpenShift internal registry’s images.

    Or, you could integrate images scanning and signature yourself using the few templates installed, as shown in a sample Jenkinsfile.

    OpenShift Supervision

    Today I am looking back on a few topics I had a hard time properly deploying using OpenShift 3.7 and missing proper dynamic provisioning despite a poorly-configured GlusterFS cluster.
    Since then, I deployed a 3 nodes Ceph cluster, using Sebastien Han’s ceph-ansible playbooks, allowing me to further experiment with persistent volumes.
    And OpenShift Origin 3.9 also came out, shipping with various fixes and new features, such Gluster Block volumes support, that might address some of GlusterFS performances issues.

     

    OpenShift Ansible playbooks include a set of roles focused on collecting and making sense out of your cluster metrics, starting with Hawkular.

    We could set up a few Pods running Hawkular, Heapster to collect data from your OpenShift nodes and a Cassandra database to store them, defining the following variables and applying the playbooks/openshift-metrics/config.yml playbook:

    Hawkular

    Hawkular integration with OpenShift

    openshift_metrics_cassandra_limit_cpu: 3000m
    openshift_metrics_cassandra_limit_memory: 3Gi
    openshift_metrics_cassandra_node_selector: {“region”:”infra”}
    openshift_metrics_cassandra_pvc_prefix: hawkular-metrics
    openshift_metrics_cassandra_pvc_size: 40G
    openshift_metrics_cassandra_request_cpu: 2000m
    openshift_metrics_cassandra_request_memory: 2Gi
    openshift_metrics_cassandra_storage_type: pv
    openshift_metrics_cassandra_pvc_storage_class_name: ceph-storage
    openshift_metrics_cassanda_pvc_storage_class_name: ceph-storage

    openshift_metrics_image_version: v3.9
    openshift_metrics_install_metrics: True
    openshift_metrics_duration: 14
    openshift_metrics_hawkular_limits_cpu: 3000m
    openshift_metrics_hawkular_limits_memory: 3Gi
    openshift_metrics_hawkular_node_selector: {“region”:”infra”}
    openshift_metrics_hawkular_requests_cpu: 2000m
    openshift_metrics_hawkular_requests_memory: 2Gi
    openshift_metrics_heapster_limits_cpu: 3000m
    openshift_metrics_heapster_limits_memory: 3Gi
    openshift_metrics_heapster_node_selector: {“region”:”infra”}
    openshift_metrics_heapster_requests_cpu: 2000m
    openshift_metrics_heapster_requests_memory: 2Gi

    Note that we are defining both openshift_metrics_cassandra_pvc_storage_class_name and openshit_metrics_cassanda_pvc_storage_class_name due to a typo that was recently fixed, yet not in OpenShift Origin last packages.

    Setting up those metrics may allow you to create Nagios commands based on querying for resources allocations and consumptions, using:

    $ oc adm top node –heapster-namespacce=openshift-infra –heapster-scheme=https node.example.com

     

    Another solution that integrates well with OpenShift is Prometheus, that could be deployed using the playbooks/openshift-prometheus/config.yml playbook and those Ansible variables:

    Prometheus

    Prometheus showing OpenShift Pods CPU usages

    openshift_prometheus_alertbuffer_pvc_size: 20Gi
    openshift_prometheus_alertbuffer_storage_class: ceph-storage
    openshift_prometheus_alertbuffer_storage_type: pvc
    openshift_prometheus_alertmanager_pvc_size: 20Gi
    openshift_prometheus_alertmanager_storage_class: ceph-storage
    openshift_prometheus_alertmanager_storage_type: pvc
    openshift_prometheus_namespace: openshift-metrics
    openshift_prometheus_node_selector: {“region”:”infra”}
    openshift_prometheus_pvc_size: 20Gi
    openshift_prometheus_state: present
    openshift_prometheus_storage_class: ceph-storage
    openshift_prometheus_storage_type: pvc

     

    We could also deploy some Grafana, that could include a pre-configured dashboard, rendering some Prometheus metrics – thanks to the playbooks/openshift-grafana/config.yml playbook and the following Ansible variables:

    Grafana

    OpenShift Dashboard on Grafana

    openshift_grafana_datasource_name: prometheus
    openshift_grafana_graph_granularity: 2m
    openshift_grafana_namespace: openshift-grafana
    openshift_grafana_node_exporter: True
    openshift_grafana_node_selector: {“region”:”infra”}
    openshift_grafana_prometheus_namespace: openshift-metrics
    openshift_grafana_prometheus_serviceaccount: prometheus
    openshift_grafana_storage_class: ceph-storage
    openshift_grafana_storage_type: pvc
    openshift_grafana_storage_volume_size: 15Gi

     

    And finally, we could also deploy logs centralization with the playbooks/openshift-logging/config.yml playbook, setting the following:

    Kibana

    Kibana integration with EFK

    openshift_logging_install_logging: True
    openshift_logging_curator_default_days: ‘7’
    openshift_logging_curator_cpu_request: 100m
    openshift_logging_curator_memory_limit: 256Mi
    openshift_logging_curator_nodeselector: {“region”:”infra”}
    openshift_logging_elasticsearch_storage_type: pvc
    openshift_logging_es_cluster_size: ‘1’
    openshift_logging_es_cpu_request: ‘1’
    openshift_logging_es_memory_limit: 8Gi
    openshift_logging_es_pvc_storage_class_name: ceph-storage
    openshift_logging_es_pvc_dynamic: True
    openshift_logging_es_pvc_size: 25Gi
    openshift_logging_es_recover_after_time: 10m
    openshift_logging_es_nodeslector: {“region”:”infra”}
    openshift_logging_es_number_of_shards: ‘1’
    openshift_logging_es_number_of_replicas: ‘0’
    openshift_logging_fluentd_buffer_queue_limit: 1024
    openshift_logging_fluentd_buffer_size_limit: 1m
    openshift_logging_fluentd_cpu_request: 100m
    openshift_logging_fluentd_file_buffer_limit: 256Mi
    openshift_logging_fluentd_memory_limit: 512Mi
    openshift_logging_fluentd_nodeselector: {“region”:”infra”}
    openshift_logging_fluentd_replica_count: 2
    openshift_logging_kibana_cpu_request: 600m
    openshift_logging_kibana_hostname: kibana.router.intra.unetresgrossebite.com
    openshift_logging_kibana_memory_limit: 736Mi
    openshift_logging_kibana_proxy_cpu_request: 200m
    openshift_logging_kibana_proxy_memory_limit: 256Mi
    openshift_logging_kibana_replica_count: 2
    openshift_logging_kibana_nodeselector: {“region”:”infra”}

     

    Meanwhile we could note that cri-o is getting better support in the latter versions of OpenShift, among a never-ending list of ongoing works and upcoming features.

    OpenShift

    As of late 2017, I got introduced to OpenShift. Even though I’ve only been playing with a few basic features, nesting Docker into static KVMs,  I was pretty impressed by the simplicity of services deployment, as served to end-users.

    After replacing 4x MicroServer, by 3x SE318m1

    After replacing 4x MicroServer, by 3x SE318m1

    I’ve first tried setting my own, re-using my ProLian MicroServers. One of my master node was refusing to deploy, CPU usage averaging around 100%, systemctl consistently timing out while starting some process – that did start on my two other master nodes.
    After trying to resize my KVMs in vain, I eventually went another way: shut down a stair of ProLian MicroServer, move them out of my rack and plug instead 3 servers I ordered a couple years ago, that never reached prod – due to doubts regarding overall power consumption, EDF being able to deliver enough Amperes, my switches not being able to provide with enough LACP channels, my not having enough SSDs or quad-port Ethernet cards in stock to fill these servers,  …

    I eventually compromised, and harvested any 500G SSDs disks available out of my Ceph cluster, mounting one per 1U server.

    Final setup involves the following physical servers:

    • a custom tower (core i5, 32G DDR, 128G SSD disk)
    • 3x HP SE316M1 (2xE5520, 24G DDR) – 500G SSD
    • 2x HP SE1102 (2xE5420 12G DDR) – 500G SSD
    • 3x ProLian MicroServer G5 (Turion, 4-8G DDR) – 64G SSD + 3×3-4T HDD

    And on top of these, a set of KVM instances, including:

    • 3 master nodes (2 CPU, 8G RAM)
    • 3 infra nodes (2 CPU, 6G RAM)
    • 3 compute nodes (4 CPU, 10G RAM @SE316M1)
    • 3 storage nodes (1 CPU, 3G RAM @MicroServer)

    Everything running on CentOS7. Except for some Ansible DomU I would use deploying OpenShift, running Debian Stretch.

     

    OpenShift can be deployed using Ansible. And as I’ve been writing my own roles for the past couple years, I can testify these ones are amazing.

    GlusterFS @OpenShift

    GlusterFS @OpenShift

    First ansible run would be done setting the following variables, bootstrapping service on top of my existing domain name, and LDAP server.

    ansible_ssh_user: root
    openshift_deployment_type: origin
    openshift_disable_check: disk_availability,docker_storage,memory_availability
    openshift_master_cluster_method: native
    openshift_master_cluster_hostname: openshift.intra.unetresgrossebite.com
    openshift_master_cluster_public_hostname: openshift.intra.unetresgrossebite.com
    openshift_master_default_subdomain: router.intra.unetresgrossebite.com
    openshift.common.dns_domain: openshift.intra.unetresgrossebite.com
    openshift_clock_enabled: True
    openshift_node_kubelet_args: {‘pods-per-core’: [’10’], ‘max-pods’: [‘250’], ‘image-gc-high-threshold’: [’90’], ‘image-gc-low-threshold’: [’80’]}
    openshift_master_identity_providers:
    – name: UneTresGrosseBite
      challenge: ‘true’
      login: ‘true’
      kind: LDAPPasswordIdentityProvider
      attributes:
        id: [‘dn’]
        email: [‘mail’]
        name: [‘sn’]
        preferredUsername: [‘uid’]
      bindDN: cn=openshift,ou=services,dc=unetresgrossebite,dc=com
      bindPassword: secret
      ca: ldap-chain.crt
      insecure: ‘false’
      url: ‘ldaps://netserv.vms.intra.unetresgrossebite.com/ou=users,dc=unetresgrossebite,dc=com?uid?sub?(&(objectClass=inetOrgPerson)(!(pwdAccountLockedTime=*)))’
    openshift_master_ldap_ca_file: /root/ldap-chain.crt

    Setting up glusterfs, note you may have difficulties setting gluster block devices as group vars, and could find a solution sticking to defining these directly into your inventory file:

    [glusterfs]
    gluster1.friends.intra.unetresgrossebite.com glusterfs_ip=10.42.253.100 glusterfs_devices='[ “/dev/vdb”, “/dev/vdc”, “/dev/vdd” ]’
    gluster2.friends.intra.unetresgrossebite.com glusterfs_ip=10.42.253.101 glusterfs_devices='[ “/dev/vdb”, “/dev/vdc”, “/dev/vdd” ]’
    gluster3.friends.intra.unetresgrossebite.com glusterfs_ip=10.42.253.102 glusterfs_devices='[ “/dev/vdb”, “/dev/vdc”, “/dev/vdd” ]’

    Apply the main playbook with:

    ansible-playbook playbooks/byo/config.yml -i ./hosts

    Have a break: with 4 CPUs & 8G RAM on my ansible host, applying a single variable change (pretty much everything was installed beforehand), I would still need over an hour and a half applying the full playbook: whenever possible, stick to whatever service-specific playbook you may find, …

    Jenkins @OpenShift

    Jenkins @OpenShift

    As a sidenote, be careful to properly set your domain name before deploying glusterfs. So far, while I was able to update my domain name almost everywhere running Ansible playbooks back, GlusterFS’s hekiti route was the first I noticed not being renamed.
    Should you fuck up your setup, you can use oc project glusterfs then oc get pods to locate your running containers, use oc rsh <container> then rm -fr /var/lib/hekiti to purge stuff that may prevent further deployments, …
    Then oc delete project glusterfs, to purge almost everything else.
    You may also use running docker images | grep gluster and docker rmi <images>, … As well as making sure to wipe the first sectors of your gluster disks (for d in b c d; do dd if=/dev/zero of=/dev/vd$d bs=1M count=8; done). You may need to reboot your hosts (if a wipefs -a /dev/drive returns with an error). Finally, re-deploy a new GlusterFS cluster from scratch using Ansible.

     

    Once done with the main playbook, you should be able to log into your OpenShift dashboard. Test it by deploying Jenkins.

    hawkular @OpenShift

    Hawkular integration @OpenShift

     

     

    You could (should) also look into deploying OpenShift cluster metrics collection, based on Hawkular & Heapster.
    Sticking with volatile storage, you would need adding the following variable to all your hosts:

     

    openshift_metrics_install_metrics: True

    Note to deploy these roles, you would have to install on your Ansible host (manually!) python-passlib, apache2-utils and openjdk-8-jdk-headless (assuming Debian/Ubuntu). You may then deploy metrics using the playbooks/byo/openshift-cluster/openshift-metrics.yml playbook.

    Hawkular integration would allow you to track resources usage directly from OpenShift dashboard.

    Prometheus @OpenShift

    Prometheus @OpenShift

    You could also setup Prometheus defining the following:

    openshift_prometheus_namespace: openshift-metrics
    openshift_prometheus_node_selector: {“region”:”infra”}

    And applying the playbooks/byo/openshift-cluster/openshift-prometheus.yml playbook.

     

    You should also be able to setup some kind of centralized logging based on ElasticSearch, Kibana & Fluentd, using the following:

    openshift_logging_install_logging: True
    openshift_logging_kibana_hostname: kibana.router.intra.unetresgrossebite.com
    openshift_logging_es_memory_limit: 4Gi
    openshift_logging_storage_kind: dynamic
    openshift_cloudprovider_kind: glusterfs

    Although so far, I wasn’t able to get it running properly ElasticSearch health is stuck to yellow, while Kibana and Fluentd can’t reach it somehow, could be due to a missing DNS record.

     

    From there, you would find plenty solutions, packaged for OpenShift, ready to deploy (a popular one seems to be Go Git Server).
    Deploying new services can still be a little painful, although there’s no denying OpenShift offers with a potentially amazing SAAS toolbox.

    Graphite & Riemann

    There are several ways of collecting runtime metrics out of your software. We’ve discussed of Munin or Datadog already, we could talk about Collectd as well, although these solutions would mostly aim at system monitoring, as opposed to distributed systems.

    Business Intelligence may require collecting metrics from a cluster of workers, aggregating them into comprehensive graphs, such as short-living instances won’t imply a growing collection of distinct graphs.

     

    Riemann is a Java web service, allowing to collect metrics over TCP or UDP, and serving with a simple web interface generating dashboards, displaying your metrics as they’re received. Configuring Riemann, you would be able to apply your input with transformations, filtering, … You may find a quickstart here, or something more exhaustive over there. A good starting point could be to keep it simple:

    (logging/init {:file “/var/log/riemann/riemann.log”})
    (let [host “0.0.0.0”]
    (tcp-server {:host host})
    (ws-server {:host host}))
    (periodically-expire 5)
    (let [index (index)] (streams (default :ttl 60 index (expired (fn [event] (info “expired” event))))))

    Riemann Sample Dashboard

    Riemann Sample Dashboard

    Despite being usually suspicious of Java applications or Ruby web services, I tend to trust Riemann even under heavy workload (tens of collectd, forwarding hundreds of metrics per second).

    Riemann Dashboard may look unappealing at first, although you should be able to build your own monitoring screen relatively easily. Then again, this would require a little practice, and some basic sense of aesthetics.

     

     

    Graphite Composer

    Graphite Composer

    Graphite is a Python web service providing with a minimalist yet pretty powerful browser-based client, that would allow you to render graphs. Basic Graphite setup would usually involve some SQlite database storing your custom graphs and users, as long as another Python service: Carbon, storing metrics. Such setup would usually also involve Statsd, a NodeJS service listening for metrics, although depending on what you intend to monitor, you might find your way into writing to Carbon directly.

    Setting up Graphite on Debian Stretch may be problematic, due to some python packages being deprecated, while the last Graphite packages aren’t available yet. After unsuccessfully trying to pull copies from PIP instead of APT, I eventually ended up setting my first production instance based on Devuan Jessie. Setup process would drastically vary based on distribution, versions, your web server, Graphite database, … Should you go there: consider all options carefully before starting.

    Graphite could be used as is: the Graphite Composer would let you generate and save graphs, aggregating any collected metric, while the Dashboard view would let you aggregate several graphs into a single page. Although note you could use Graphite as part of Grafana as well, among others.

     

    From there, note Riemann can be reconfigured forwarding everything to Graphite (or using your own filters), adding to your riemann.config:

    (def graph (graphite {:host “10.71.100.164”}))
    (streams graph)

    This should allow you to run Graphite without Statsd, having Riemann collecting metrics from your software and forwarding them into Carbon.

     

    The next step would be to configure your applications, forwarding data to Riemann (or Statsd, should you want to only use Graphite). Databases like Cassandra or Riak could forward some of their own internal metrics, using the right agent. Or, collecting BI metrics from your own code.

    Graphite Dashboard

    Graphite Dashboard

    Using NodeJS, you will find a riemannjs module that does the job. Or node-statsd-client, for Statsd.

    Having added some scheduled tasks to our code, querying for how many accounts we have, how many are closed, how many were active during the last day, week and month, … I’m eventually able to create a dashboard based on saved graphs, aggregating metrics in some arguably-meaningful fashion.

    m242/Maildrop

    A couple days ago, some colleague of mine asked me to set up our own Maildrop service, as served by maildrop.cc. Some of you may also be familiar with Mailinator, which offers with a similar service.

    Note to be confused with Maildrop (popular MDA, as distributed in Debian packages, among others, …) m242/Maildrop is based on Scala, Java and a Redis queue. The project is divided into two workers connecting to Redis.

    maildrop

    maildrop

    An SMTP worker would be processing inbound messages, eventually writing them to Redis. Listening on 127.0.0.1:25000 by default, nginx may be used proxying traffic.

    Meanwhile, the HTTP worker would serve clients with any mailbox – no authentication required. Developers may write tests checking for some arbitrary mailbox using some HTTP API.

    As you may guess: both workers and database may be scaled horizontally. Although being a pretty specific implementation, you probably won’t need it to.

    Their GitHub project isn’t much active, sadly. A few issues piling up, two of which I’ve been able to post pull requests for. Then again, getting it working isn’t much complicated, and may prove pretty useful testing for regressions.