Menu

Results for category "Best Practices"

28 Articles

OpenShift Egress Traffic Management

Today, I’m investigating yet another OpenShift feature: Egress Routers. We would look at how to ensure a given connection leaves our cluster using a given IP address, integrating OpenShift with existing services that would be protected by some kind of IP filter.

 

First, let’s explain the default behavior of OpenShift.

Deployments are scheduled on OpenShift hosts, eventually leading to containers being started on those nodes.

Filtering OpenShift compute hosts IPs

Filtering OpenShift compute hosts IPs

Whenever contacting a service outside OpenShift SDN, a container would exit OpenShift network through the node it’s been started on. Meaning the corresponding connection would get NAT-ed using the node IP our container currently is.

As such, a straight forward way of allowing OpenShift containers to reach a protected service could be to trust all my OpenShift hosts IP addresses connecting to those services.

Note that this sample implies trusting all containers that may be scheduled on those OpenShift nodes, contacting our remote service. It could be acceptable in a few cases, although whenever OpenShift is shared among multiple users or tenants, it usually won’t.

While we could address this by dedicating OpenShift nodes to users requiring access to a same set of protected resources, we would remain limited by the amount of OpenShift nodes composing our cluster.

 

Instead of relying on OpenShift nodes addresses, we could involve additional addresses dedicated to accessing external resources.

A first way to implement this would be to allocate an OpenShift namespace with its own Egress IPs:

$ oc patch netnamespace toto -p ‘{“egressIPs”: [“10.42.253.45″,”10.42.253.54”]}’

Such configuration would also require us to associate these egress IPs to OpenShift nodes’ hostsubnet:

$ oc patch hostsubnet compute3 -p ‘{“egressIPs”: [“10.42.253.45”]}’

$ oc patch hostsubnet compute1 -p ‘{“egressIPs”: [“10.42.253.54”]}’

Then, from a Pod of our toto netnamespace, we could try to reach a remote service:

$ oc rsh -n toto jenkins-1-xyz
sh-4.2$ ping 8.8.8.8
[…]
64 bytes from 8.8.8.8: icmp_seq=90 ttl=119 time=4.01 ms
64 bytes from 8.8.8.8: icmp_seq=91 ttl=119 time=4.52 ms
^C
— 8.8.8.8 ping statistics —
91 packets transmitted, 89 received, 2% packet loss, time 90123ms
rtt min/avg/max/mdev = 3.224/4.350/12.042/1.073 ms

Notice we did lost a few packets. The reason for this is that I rebooted the compute3 host, from which my ping was initially leaving the cluster. While the node was marked NotReady, traffic went through the second node, compute1, holding an egressIP associated to the toto netnamespace. From our gateway, we can confirm the new IP is temporarily being used:

# tcpdump -vvni vlan5 host 10.42.253.54
tcpdump: listening on vlan5, link-type EN10MB
13:11:13.066821 10.42.253.54 > 8.8.8.8: icmp: echo request (id:023e seq:3) [icmp cksum ok] (DF) (ttl 63, id 24619, len 84)
13:11:13.070596 arp who-has 10.42.253.54 tell 10.42.253.5
13:11:13.071194 arp reply 10.42.253.54 is-at 52:54:00:b1:15:b9
13:11:13.071225 8.8.8.8 > 10.42.253.54: icmp: echo reply (id:023e seq:3) [icmp cksum ok] [tos 0x4] (ttl 120, id 14757, len 84)
13:11:14.066796 10.42.253.54 > 8.8.8.8: icmp: echo request (id:023e seq:4) [icmp cksum ok] (DF) (ttl 63, id 25114, len 84)
13:11:14.069990 8.8.8.8 > 10.42.253.54: icmp: echo reply (id:023e seq:4) [icmp cksum ok] [tos 0x4] (ttl 120, id 515, len 84)

As soon as compute3 is done rebooting, tcpdump confirms 10.42.253.54 is no longer used.

Namespace-based IP Filtering

Namespace-based IP Filtering

From the router point of view, we can see that the hardware address for our Egress IPs match those of our OpenShift hosts:

# arp -na | grep 10.42.253
[…]
10.42.253.20 52:54:00:b1:15:b9 vlan5 19m49s
10.42.253.21 52:54:00:6b:99:ad vlan5 19m49s
10.42.253.23 52:54:00:23:1c:4f vlan5 19m54s
10.42.253.45 52:54:00:23:1c:4f vlan5 7m36s
10.42.253.54 52:54:00:b1:15:b9 vlan5 10m35s

As such, this configuration may be preferable whenever the network hosting OpenShift would not allow introducing virtual hardware addresses.

Note that usage for a node’s EgressIP is reserved to the netnamespaces specifically requesting them. Any other container executed on my compute1 and compute3 hosts are still being NAT-ed using 10.42.253.20 and 10.42.253.23 respectively.

Also note that namespace based IP filtering does not rely on any placement rule: containers could get started on any OpenShift node in your cluster, their traffic would still exit OpenShift SDN through a designated node, according to netnamespaces and hostsubnets configurations.

Bearing in mind that whenever the EgressIPs from your netnamespaces are no longer assigned to a node from your cluster – be that due to a missing configuration, or an outage affecting all your Egress hosts – then containers from the corresponding projects would no longer have access to resources out of OpenShift SDN.

 

Now that we’re familiar with the basics of OpenShift Egress traffic management, we can focus on Egress Routers.

Several Egress Routers implementation exist, we would focus on the couple most commons that are the Redirect mode, and the HTTP proxy mode.

In both cases, we would use a dedicated project hosting a router Pod:

$ oc new-project egress-routers

We would also rely on a ServiceAccount, that may start privileged containers:

$ oc create sa egress-init

As well as a SecurityContextContraint granting our ServiceAccount such privileges:

$ cat <<EOF >egress-scc.yml
kind: SecurityContextConstraints
apiVersion: v1
metadata: { name: egress-init }
allowPrivilegedContainer: true
runAsUser: { type: RunAsAny }
seLinuxContext: { type: RunAsAny }
fsGroup: { type: RunAsAny }
supplementalGroups: { type: RunAsAny }
users: [ “system:serviceaccount:egress-routers:egress-init” ]
EOF
$ oc create -f egress-scc.yml

Running a Redirect Egress Router, we would then create a controller ensuring a Pod would deal with configuring OpenShift SDN NAT-ing the traffic with a specific Egress IP:

$ cat <<EOF >redirect-router.yml
apiVersion: v1
kind: ReplicationController
metadata:
  name: egress-router
spec:
  replicas: 1
  selector:
    name: egress-router
  template:
    metadata:
      name: egress-router
      labels:
        name: egress-router
      annotations:
        pod.network.openshift.io/assign-macvlan: “true”
    spec:
      initContainers:
      – name: egress-demo-init
        image: docker.io/openshift/origin-egress-router
        env:
        – name: EGRESS_SOURCE
          value: 10.42.253.46
        – name: EGRESS_GATEWAY
          value: 10.42.253.1
        – name: EGRESS_DESTINATION
          value: 8.8.8.8
        – name: EGRESS_ROUTER_MODE
          value: init
        securityContext:
          privileged: true
        serviceAccountName: egress-init
      containers:
      – name: egress-demo-wait
        image: docker.io/openshift/origin-pod
      nodeSelector:
        node-role.kubernetes.io/infra: “true”
      serviceAccountName: egress-init
EOF
$ oc create -f redirect-router.yml

Note we are first starting an init container setting up proper iptalbes rules using a few variables. EGRESS_SOURCE is an arbitrary and un-allocated IP address in OpenShift subnet, EGRESS_GATEWAY is our default gateway and EGRESS_DESTINATION the remote address our Egress router would forward its traffic to.

Once our init container is done updating iptables configuration, it is shut down and replaced by our main pod, that would not do anything.

At that point, we could enter that Pod, and see that all its traffic exit OpenShift subnet being NAT-ed with our EGRESS_SOURCE IP, by the OpenShift host executing our container.

From the network gateway point of view, we could notice our EGRESS_SOURCE_IP address is associated to a virtual hardware address:

# arp -na | grep 10.42.253
[…]
10.42.253.46 d6:76:cc:f4:e3:d9 vlan5 19m34s

Contrarily to namespace-scoped Egress IPs, Egress Routers may be scheduled anywhere on OpenShift cluster, according to an arbitrary  – and optional – placement rule. Although it relies on containers, which might take a few seconds to start depending on images being available in Docker local caches. Another drawback being that a single IP can not be shared among several routers, we would not be able to scale them.

To offer with redundancy, we could however setup several Egress Routers per protected service, using distinct EGRESS_SOURCE, and sharing the same EGRESS_DESTINATION.

While we’ve seen our Egress Router container exits our cluster to any remotes using our designated EGRESS_SOURCE address, let’s now look at how to use that router from other OpenShift hosted containers. First, we would create a service identifying our Egress Routers:

$ oc create service –name egress-redirect –namespace egress-routers –port=53 –selector=name=egress-router

Depending on your network plugin we would have to allow traffic coming to that service from third-party Projects. We would then be able to query our EGRESS_DESTINATION through our service:

$ curl http://egress-redirect.egress-routers.svc:53/

From our gateway, we could see the corresponding traffic leaving OpenShift SDN, NAT-ed using our EGRESS_SOURCE:

# tcpdump -vvni vlan5 host 8.8.8.8
tcpdump: listening on vlan5, link-type EN10MB
11:11:53.357775 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84645569 0,nop,wscale 7> (DF)
11:11:54.357948 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84646572 0,nop,wscale 7> (DF)
11:11:56.361964 10.42.253.46.53084 > 8.8.8.8.53: S [tcp sum ok] 1167661839:1167661839(0) win 28200 <mss 1410,sackOK,timestamp 84648576 0,nop,wscale 7> (DF)

Redirect Egress Routers

Redirect Egress Routers

Note that the EGRESS_DESTINATION definition may include more than a single address, depending on the protocol and port queried, we could route those connections to distinct remotes:

env:
– name: EGRESS_DESTINATION
  value: |
    80 tcp 203.0.113.25
    8080 tcp 203.0.113.26 80
    8443 tcp 203.0.113.26 443
    203.0.113.27

That snippet would ensure that connections to our router pod on TCP port 80 would be sent to a first remote address, while those to 8080 and 8443 are translated to ports 80 and 443 respectively of a second address, and any other traffic sent to a third remote address.

We could very well set these into a ConfigMap, to eventually include from our Pods configuration, ensuring consistency among a set of routers.

Obviously from OpenShift containers point of view, instead of connecting to our remote service, we would have to reach our Egress Router Service, which would in turn ensure proper forwarding of our requests.

 

Note that Redirect Egress Routers are limited to TCP and UDP traffic, while usually not recommended dealing with HTTP communications. That later case is best suited for the HTTP Proxy Egress Routers, relying on Squid.

Although very similar to Redirect Egress Routers, the HTTP Proxy would not set an EGRESS_DESTINATION environment variable on its init containers, and would instead pass an EGRESS_HTTP_PROXY_DESTINATION to the main container, such as:

$ cat <<EOF >egress-http.yml
apiVersion: v1
kind: ReplicationController
metadata:
  name: egress-http
spec:
  replicas: 1
  selector:
    name: egress-http
  template:
    metadata:
      name: egress-http
      labels:
        name: egress-http
      annotations:
        pod.network.openshift.io/assign-macvlan: “true”
    spec:
      initContainers:
      – name: egress-demo-init
        image: openshift/origin-egress-router
        env:
        – name: EGRESS_SOURCE
          value: 10.42.253.43
        – name: EGRESS_GATEWAY
          value: 10.42.253.1
        – name: EGRESS_ROUTER_MODE
          value: http-proxy
        securityContext:
          privileged: true
        serviceAccountName: egress-init
      containers:
      – name: egress-demo-proxy
        env:
        – name: EGRESS_HTTP_PROXY_DESTINATION
          value: |
            !perdu.com
            !*.perdu.com
            !10.42.253.0/24
            *
        image: openshift/origin-egress-http-proxy
      nodeSelector:
        node-role.kubernetes.io/infra: “true”
      serviceAccountName: egress-init
EOF
$ oc create -f egress-http.yml

Note the EGRESS_HTTP_PROXY_DESTINATION definition allows us to deny access to specific resources, such as perdu.com and its subdomain or an arbitrary private subnet, while we would allow any other communication with a wildcard.

By default, the Egress HTTP Proxy image listens on TCP port 8080, which allows us to create a service such as the following:

$ oc create service –name egress-http –namespace egress-routers –port=8080 –selector=egress-http

And eventually use that service from other OpenShift containers, based on environment variable proper definition:

$ oc rsh -n too jenkins-1-xyz
sh-4.2$ $ https_proxy=http://egress-http.egress-routers.svc:8080 http_proxy=http://egress-http.egress-routers.svc:8080/ curl -vfsL http://free.fr -o /dev/null
[… 200 OK …]
sh-4.2$ https_proxy=http://egress-http.egress-routers.svc:8080 http_proxy=http://egress-http.egress-routers.svc:8080/ curl -vfsL http://perdu.com -o /dev/null
[… 403 forbidden …]

As for our Redirect Egress Router, running tcpdump on our gateway would confirm traffic is properly NAT-ed:

# tcpdump -vvni vlan5 host 10.42.253.43
[…]
12:11:37.385219 212.27.48.10.443 > 10.42.253.43.55906: . [bad tcp cksum b96! -> 9b9f] 3563:3563(0) ack 446 win 30016 [tos 0x4] (ttl 63, id 1503, len 40)
12:11:37.385332 212.27.48.10.443 > 10.42.253.43.55906: F [bad tcp cksum b96! -> 9ba0] 3562:3562(0) ack 445 win 30016 [tos 0x4] (ttl 63, id 40993, len 40)
12:11:37.385608 10.42.253.43.55908 > 212.27.48.10.443: . [tcp sum ok] 472:472(0) ack 59942 win 64800 (DF) (ttl 64, id 1694, len 40)
12:11:37.385612 10.42.253.43.55906 > 212.27.48.10.443: . [tcp sum ok] 446:446(0) ack 3563 win 40320 (DF) (ttl 64, id 1695, len 40)

While our router ARP table would show records similar to Redirect Egress Router ones:

# arp -na | grep 10.42.253.43
10.42.253.43 d2:87:15:45:1c:28 vlan5 18m52s

Depending on security requirements and the kind of service we want to query, OpenShift is pretty flexible. Although the above configurations do not represent an exhaustive view of existing implementations, we did cover the most basic use cases from OpenShift documentations, which are more likely to remain supported.

Whenever possible, using namespace-scoped IPs seems to be easier, as it would not rely on any other service than OpenShift SDN applying proper routing and NAT-ing. Try to offer with several IPs per namespaces, allowing for quick failover, should a node become unavailable.

If port-based filtering is required, then Redirect Routers are more likely to satisfy, although deploying at least two Pods, using two distinct Egress IPs and node selectors would be recommended, as well as sharing a ConfigMap defining outbound routing.

Similarly, HTTP Proxy Routers would be recommended proxying HTTP traffic, as it would not require anything else than setting a few environment variables and ensuring our runtime does observe environment-based proxy configuration.

 

Signing and Scanning Docker Images with OpenShift

You may already know Docker images can be signed. Today we would discuss a way to automate images signature, using OpenShift.

Lately, I stumbled upon a bunch of interesting repositories:

  • https://github.com/redhat-cop/openshift-image-signing-scanning: ansible playbook configuring an OCP cluster, building a base image, setting up a service account and installing a few templates providing with docker images scanning and signing
  • https://github.com/redhat-cop/image-scanning-signing-service: an optional OpenShift third-party service implementing support for ImageSigningRequest and ImageScanningRequest objects
  • https://github.com/redhat-cop/openshift-event-controller: sources building an event controller that would watch for new images pushed to OpenShift docker registry
  • Although these are amazing, I could not deploy them to my OpenShift Origin, due to missing subscriptions and packages.

    image signing environment overview

    image signing environment overview

    In an effort to introduce CentOS support, I forked the first repository from our previous list, and started rewriting what I needed:

    https://github.com/faust64/openshift-image-signing-scanning

     

    A typical deployment would involve:

  • Generating a GPG keypair on some server (not necessarily related to OpenShift)
  • Depending on your usecase, we could then want to configure docker to prevent unsigned images from being run on our main OpenShift hosts
  • Next, we would setup labels and taints identifying the nodes we trust signing images, as well as apply and install a few templates and a base image
  • At which point, you could either install the event-controller Deployment to watch for all your OpenShift internal registry’s images.

    Or, you could integrate images scanning and signature yourself using the few templates installed, as shown in a sample Jenkinsfile.

    OpenShift Supervision

    Today I am looking back on a few topics I had a hard time properly deploying using OpenShift 3.7 and missing proper dynamic provisioning despite a poorly-configured GlusterFS cluster.
    Since then, I deployed a 3 nodes Ceph cluster, using Sebastien Han’s ceph-ansible playbooks, allowing me to further experiment with persistent volumes.
    And OpenShift Origin 3.9 also came out, shipping with various fixes and new features, such Gluster Block volumes support, that might address some of GlusterFS performances issues.

     

    OpenShift Ansible playbooks include a set of roles focused on collecting and making sense out of your cluster metrics, starting with Hawkular.

    We could set up a few Pods running Hawkular, Heapster to collect data from your OpenShift nodes and a Cassandra database to store them, defining the following variables and applying the playbooks/openshift-metrics/config.yml playbook:

    Hawkular

    Hawkular integration with OpenShift

    openshift_metrics_cassandra_limit_cpu: 3000m
    openshift_metrics_cassandra_limit_memory: 3Gi
    openshift_metrics_cassandra_node_selector: {“region”:”infra”}
    openshift_metrics_cassandra_pvc_prefix: hawkular-metrics
    openshift_metrics_cassandra_pvc_size: 40G
    openshift_metrics_cassandra_request_cpu: 2000m
    openshift_metrics_cassandra_request_memory: 2Gi
    openshift_metrics_cassandra_storage_type: pv
    openshift_metrics_cassandra_pvc_storage_class_name: ceph-storage
    openshift_metrics_cassanda_pvc_storage_class_name: ceph-storage

    openshift_metrics_image_version: v3.9
    openshift_metrics_install_metrics: True
    openshift_metrics_duration: 14
    openshift_metrics_hawkular_limits_cpu: 3000m
    openshift_metrics_hawkular_limits_memory: 3Gi
    openshift_metrics_hawkular_node_selector: {“region”:”infra”}
    openshift_metrics_hawkular_requests_cpu: 2000m
    openshift_metrics_hawkular_requests_memory: 2Gi
    openshift_metrics_heapster_limits_cpu: 3000m
    openshift_metrics_heapster_limits_memory: 3Gi
    openshift_metrics_heapster_node_selector: {“region”:”infra”}
    openshift_metrics_heapster_requests_cpu: 2000m
    openshift_metrics_heapster_requests_memory: 2Gi

    Note that we are defining both openshift_metrics_cassandra_pvc_storage_class_name and openshit_metrics_cassanda_pvc_storage_class_name due to a typo that was recently fixed, yet not in OpenShift Origin last packages.

    Setting up those metrics may allow you to create Nagios commands based on querying for resources allocations and consumptions, using:

    $ oc adm top node –heapster-namespacce=openshift-infra –heapster-scheme=https node.example.com

     

    Another solution that integrates well with OpenShift is Prometheus, that could be deployed using the playbooks/openshift-prometheus/config.yml playbook and those Ansible variables:

    Prometheus

    Prometheus showing OpenShift Pods CPU usages

    openshift_prometheus_alertbuffer_pvc_size: 20Gi
    openshift_prometheus_alertbuffer_storage_class: ceph-storage
    openshift_prometheus_alertbuffer_storage_type: pvc
    openshift_prometheus_alertmanager_pvc_size: 20Gi
    openshift_prometheus_alertmanager_storage_class: ceph-storage
    openshift_prometheus_alertmanager_storage_type: pvc
    openshift_prometheus_namespace: openshift-metrics
    openshift_prometheus_node_selector: {“region”:”infra”}
    openshift_prometheus_pvc_size: 20Gi
    openshift_prometheus_state: present
    openshift_prometheus_storage_class: ceph-storage
    openshift_prometheus_storage_type: pvc

     

    We could also deploy some Grafana, that could include a pre-configured dashboard, rendering some Prometheus metrics – thanks to the playbooks/openshift-grafana/config.yml playbook and the following Ansible variables:

    Grafana

    OpenShift Dashboard on Grafana

    openshift_grafana_datasource_name: prometheus
    openshift_grafana_graph_granularity: 2m
    openshift_grafana_namespace: openshift-grafana
    openshift_grafana_node_exporter: True
    openshift_grafana_node_selector: {“region”:”infra”}
    openshift_grafana_prometheus_namespace: openshift-metrics
    openshift_grafana_prometheus_serviceaccount: prometheus
    openshift_grafana_storage_class: ceph-storage
    openshift_grafana_storage_type: pvc
    openshift_grafana_storage_volume_size: 15Gi

     

    And finally, we could also deploy logs centralization with the playbooks/openshift-logging/config.yml playbook, setting the following:

    Kibana

    Kibana integration with EFK

    openshift_logging_install_logging: True
    openshift_logging_curator_default_days: ‘7’
    openshift_logging_curator_cpu_request: 100m
    openshift_logging_curator_memory_limit: 256Mi
    openshift_logging_curator_nodeselector: {“region”:”infra”}
    openshift_logging_elasticsearch_storage_type: pvc
    openshift_logging_es_cluster_size: ‘1’
    openshift_logging_es_cpu_request: ‘1’
    openshift_logging_es_memory_limit: 8Gi
    openshift_logging_es_pvc_storage_class_name: ceph-storage
    openshift_logging_es_pvc_dynamic: True
    openshift_logging_es_pvc_size: 25Gi
    openshift_logging_es_recover_after_time: 10m
    openshift_logging_es_nodeslector: {“region”:”infra”}
    openshift_logging_es_number_of_shards: ‘1’
    openshift_logging_es_number_of_replicas: ‘0’
    openshift_logging_fluentd_buffer_queue_limit: 1024
    openshift_logging_fluentd_buffer_size_limit: 1m
    openshift_logging_fluentd_cpu_request: 100m
    openshift_logging_fluentd_file_buffer_limit: 256Mi
    openshift_logging_fluentd_memory_limit: 512Mi
    openshift_logging_fluentd_nodeselector: {“region”:”infra”}
    openshift_logging_fluentd_replica_count: 2
    openshift_logging_kibana_cpu_request: 600m
    openshift_logging_kibana_hostname: kibana.router.intra.unetresgrossebite.com
    openshift_logging_kibana_memory_limit: 736Mi
    openshift_logging_kibana_proxy_cpu_request: 200m
    openshift_logging_kibana_proxy_memory_limit: 256Mi
    openshift_logging_kibana_replica_count: 2
    openshift_logging_kibana_nodeselector: {“region”:”infra”}

     

    Meanwhile we could note that cri-o is getting better support in the latter versions of OpenShift, among a never-ending list of ongoing works and upcoming features.

    Graphite & Riemann

    There are several ways of collecting runtime metrics out of your software. We’ve discussed of Munin or Datadog already, we could talk about Collectd as well, although these solutions would mostly aim at system monitoring, as opposed to distributed systems.

    Business Intelligence may require collecting metrics from a cluster of workers, aggregating them into comprehensive graphs, such as short-living instances won’t imply a growing collection of distinct graphs.

     

    Riemann is a Java web service, allowing to collect metrics over TCP or UDP, and serving with a simple web interface generating dashboards, displaying your metrics as they’re received. Configuring Riemann, you would be able to apply your input with transformations, filtering, … You may find a quickstart here, or something more exhaustive over there. A good starting point could be to keep it simple:

    (logging/init {:file “/var/log/riemann/riemann.log”})
    (let [host “0.0.0.0”]
    (tcp-server {:host host})
    (ws-server {:host host}))
    (periodically-expire 5)
    (let [index (index)] (streams (default :ttl 60 index (expired (fn [event] (info “expired” event))))))

    Riemann Sample Dashboard

    Riemann Sample Dashboard

    Despite being usually suspicious of Java applications or Ruby web services, I tend to trust Riemann even under heavy workload (tens of collectd, forwarding hundreds of metrics per second).

    Riemann Dashboard may look unappealing at first, although you should be able to build your own monitoring screen relatively easily. Then again, this would require a little practice, and some basic sense of aesthetics.

     

     

    Graphite Composer

    Graphite Composer

    Graphite is a Python web service providing with a minimalist yet pretty powerful browser-based client, that would allow you to render graphs. Basic Graphite setup would usually involve some SQlite database storing your custom graphs and users, as long as another Python service: Carbon, storing metrics. Such setup would usually also involve Statsd, a NodeJS service listening for metrics, although depending on what you intend to monitor, you might find your way into writing to Carbon directly.

    Setting up Graphite on Debian Stretch may be problematic, due to some python packages being deprecated, while the last Graphite packages aren’t available yet. After unsuccessfully trying to pull copies from PIP instead of APT, I eventually ended up setting my first production instance based on Devuan Jessie. Setup process would drastically vary based on distribution, versions, your web server, Graphite database, … Should you go there: consider all options carefully before starting.

    Graphite could be used as is: the Graphite Composer would let you generate and save graphs, aggregating any collected metric, while the Dashboard view would let you aggregate several graphs into a single page. Although note you could use Graphite as part of Grafana as well, among others.

     

    From there, note Riemann can be reconfigured forwarding everything to Graphite (or using your own filters), adding to your riemann.config:

    (def graph (graphite {:host “10.71.100.164”}))
    (streams graph)

    This should allow you to run Graphite without Statsd, having Riemann collecting metrics from your software and forwarding them into Carbon.

     

    The next step would be to configure your applications, forwarding data to Riemann (or Statsd, should you want to only use Graphite). Databases like Cassandra or Riak could forward some of their own internal metrics, using the right agent. Or, collecting BI metrics from your own code.

    Graphite Dashboard

    Graphite Dashboard

    Using NodeJS, you will find a riemannjs module that does the job. Or node-statsd-client, for Statsd.

    Having added some scheduled tasks to our code, querying for how many accounts we have, how many are closed, how many were active during the last day, week and month, … I’m eventually able to create a dashboard based on saved graphs, aggregating metrics in some arguably-meaningful fashion.

    Lying DNS

    As I was looking at alternative web browsers, with privacy in mind, I ended up installing Iridium (in two words: Chrome fork), tempted to try it out. I first went to Youtube, and after my first video, was being subjected to one of these advertisement sneaking in between videos, that horrible “skip add” button after a few seconds, … I was shocked enough, to open Chromium back.

    This is a perfect occasion to discuss about lying DNS servers, and how having your own cache can help you clean up your web browsing experience from at least part of these unsolicited intrusions.

    We would mainly talk about a lying DNS server (or cache, actually) alongside Internet censorship, whenever an operator either refuses to serve DNS records or diverts them somewhere else.
    Yet some of the networks I manage may relay on DNS resolvers that would purposefully prevent some records from being resolved.

    You may query google for lists of domain names likely to host irrelevant contents. These are traditionally formatted as an hosts file, and may be used as is on your computer, assuming you won’t need serving these records to other stations in your LAN.

    Otherwise, servers such as Dnsmasq, Unbound or even Bind (using RPZ, although I would recommend sticking to Unbound) may be configured overwriting records for a given list of domains.
    Currently, I trust a couple sites to provide me with a relatively exhaustive list of unwanted domain names, using a script to periodically re-download these, while merging them with my own blacklist (see my puppet modules set for further details on how I configure Unbound). I wouldn’t recommend this to anyone: using dynamic lists is risky to begin with … Then again, I wouldn’t recommend a beginner to setup his own DNS server: editing your own hosts file could be a start, though.

    Instead of yet another conf dump, I’ld rather point to a couple posts I used when setting it up: the first one from Calomel (awesome source of sample configurations running on BSD, check it out, warmest recommendations …) and an other one from a blog I didn’t remember about – although I’ve probably found it at some point in the past, considering we pull domain names from the same sources…

    Wazuh

    As a follow-up to our previous OSSEC post, and to complete the one on Fail2ban & ELK, we’ll review today Wazuh.

    netstat alerts

    netstat alerts

    As their documentation states it, “Wazuh is a security detection, visibility, and compliance open source project. It was born as a fork of OSSEC HIDS, later was integrated with Elastic Stack and OpenSCAP evolving into a more comprehensive solution“. We could remark that OSSEC packages used to be distributed on some Wazuh repository, while Wazuh is still listed as OSSEC official training, deployment and assistance services provider. You might still want to clean up some defaults, as you would soon end up receiving notifications for any connection being established or closed …

    OSSEC is still maintained, last commit to their GitHub project was a couple days ago as of writing this post, while other commits are being pushed to Wazuh repository. If both products are still active, my last attempts configuring Kibana integration with OSSEC was a failure, due to Kibana5 not being supported. Considering Wazuh offers enterprise support, we could assume their sample configuration & ruleset are at least as relevant as those you’ld find with OSSEC.

    wazuh manager status

    wazuh manager status

    Wazuh documentation is pretty straight-forward, a new service wazuh-api (NodeJS) would be required on your managers, which would then be used by Kibana querying Wazuh status. Debian packages were renamed from ossec-hids & ossec-hids-agent to wazuh-manager & wazuh-agent respectively. Configuration is somewhat similar, although you won’t be able to re-use those you could have installed alongside OSSEC. Note the wazuh-agent package would install an empty key file: you would need to drop it, prior to registering against your manager.

     

    wazuh-agents

    wazuh agents

    Configuring Kibana integration, note Wazuh documentation misses some important detail, as reported on GitHub. That’s the single surprise I had reading through their documentation, the rest of their instructions work as expected: having installed and started wazuh-api service on your manager, then installed Kibana wazuh plugin on your all your Kibana instances, you would find some Wazuh menu showing on the left. Make sure your wazuh-alerts index is registered in the Management section, then go to Wazuh.

    If uninitialized, you would be offered to enter your Wazuh backend URL, a port, a username and corresponding password, connecting to wazuh-api. Note that configuration would be saved into some new .wazuh index. Once configured, you would have some live view of your setup, which agents are connected, what alerts you’re receiving, … eventually, set up new dashboards.

    Comparing this to OSSEC PHP web interface, marked as deprecated since years, … Wazuh takes the lead!

    CIS compliance

    CIS compliance

    OSSEC alerts

    OSSEC alerts

    Wazuh Overview

    Wazuh Overview

    PCI Compliance

    PCI Compliance

    KeenGreeper

    Short post today, introducing KeenGreeper, a project I freshly released to GitHub, aiming to replace Greenkeeper 2.

    For those of you that may not be familiar with Greenkeeper, they provide with a public service that integrates with GitHub, detects your NodeJS projects dependencies and issues a pull request whenever one of these may be updated. In the last couple weeks, they started advertising about their original service being scheduled for shutdown, while Greenkeeper 2 was supposed to replace it. Until last week, I’ve only been using Greenkeeper free plan, allowing me to process a single private repository. Migrating to GreenKeeper 2, I started registering all my NodeJS repositories and only found out 48h later that it would cost me 15$/month, once they would have definitely closed the former service.

    Considering that they advertise on their website the new version offers some shrinkwrap support, while in practice outdated wrapped libraries are just left to rot, … I figured writing my own scripts collection would definitely be easier.

    keengreeper lists repositories, ignored packages, cached versions

    keengreeper lists repositories, ignored packages, cached versions

    That first release addresses the first basics, adding or removing repositories to some track list, adding or removing modules to some ignore list, preventing their being upgraded. Resolving the last version for a package from NPMJS repositories, and keeping these in a local cache. Works with GitHub as well as any ssh-capable git endpoint. Shrinkwrap libraries are actually refreshed, if such are present. Whenever an update is available, a new branch is pushed to your repository, named after the dependency and corresponding version.

    refresh logs

    refresh logs

    Having registered your first repositories, you may run the update job manually and confirm everything works. Eventually, setup a cron task executing our job script.

    Note that right now, it is assumed the user running these scripts has access to a SSH key granting it with write privileges to our repositories, pushing new branches. Now ideally, either you run this from your laptop and can assume using your own private key is no big deal, or I would recommend creating a third-party GitHub user, with its own SSH key pair, and grant it read & write access to the repositories you intend to have it work with.

    repositories listing

    repositories listing

    Being a 1 day DYI project, every feedback is welcome, GitHub is good place to start, …

    PM2

    systemctl integration

    systemctl integration

    PM2 is a NodeJS processes manager. Started on 2013 on GitHub, PM2 is currently the 82nd most popular JavaScript project on GitHub.

     

    PM2 is portable, and would work on Mac or Windows, as well as Unices. Integrated with Systemd, UpStart, Supervisord, … Once installed, PM2 would work as any other service.

     

    The first advantage of introducing PM2, whenever dealing with NodeJS services, is that you would register your code as a child service of your main PM2 process. Eventually, you would be able to reboot your system, and recover your NodeJS services without adding any scripts of your own.

    pm2 processes list

    pm2 processes list

    Another considerable advantage of PM2 is that it introduces clustering capabilities, most likely without requiring any change to your code. Depending on how your application is build, you would probably want to have at least a couple processes for each service that serves users requests, while you could see I’m only running one copy of my background or schedule processes:

     

    nodejs sockets all belong to a single process

    nodejs sockets all belong to a single process

    Having enabled clustering while starting my foreground, inferno or filemgr processes, PM2 would start two instances of my code. Now one could suspect that, when configuring express to listen on a static port, starting two processes of that code would fail with some EADDRINUSE code. And that’s most likely what would happen, when instantiating your code twice. Yet when starting such process with PM2, the latter would take over socket operations:

    And since PM2 is processing network requests, during runtime: it is able to balance traffic to your workers. While when deploying new revisions of your code, you may gracefully reload your processes, one by one, without disrupting service at any point.

    pm2 process description

    pm2 process description

    PM2 also allows to track per-process resources usage:

    Last but not least, PM2 provides with a centralized dashboard. Full disclosure: I only enabled this feature on one worker, once. And wasn’t much convinced, as I’m running my own supervision services, … Then again, you could be interested, if running a small fleet, or if ready to invest on Keymetrics premium features. Then again, it’s worth mentioning that your PM2 processes may be configured forwarding metrics to Keymetrics services, so you may keep an eye on memory or CPU usages, application versions, or even configure reporting, notifications, … In all fairness, I remember their dashboard looked pretty nice.

    pm2 keymetrics dashboard, as advertised on their site

    pm2 keymetrics dashboard, as advertised on their site

    Fail2ban & ELK

    Following up on a previous post regarding Kibana and ELK5 recent release, today we’ll follow up configuring some map visualizing hosts as Fail2ban blocks them.

    Having installed Fail2ban and configured the few jails that are relevant for your system, look for Fail2ban log file path (variable logtarget, defined in /etc/fail2ban/fail2ban.conf, defaults to /var/log/fail2ban.log on debian).

    The first thing we need is to have our Rsyslog processing Fail2ban logs. Here, we would use Rsyslog file input module (imfile), and force using FQDN instead of hostnames.

    $PreserveFQDN on
    module(load=”imfile” mode=”polling” PollingInterval=”10″)
    input(type=”imfile”
      File=”/var/log/fail2ban.log”
      statefile=”/var/log/.fail2ban.log”
      Tag=”fail2ban: ”
      Severity=”info”
      Facility=”local5″)

    Next, we’ll configure Rsyslog forwarding messages to some remote syslog proxy (which will, in turn, forward its messages to Logstash). Here, I usually rely on rsyslog RELP protocol (may require installing rsyslog-relp package), as it addresses some UDP flaws, without shipping with traditional TCP syslog limitations.

    Relaying syslog messages to our syslog proxy: load in RELP output module, then make sure your Fail2ban logs would be relayed.

    $ModLoad omrelp
    local5.* :omrelp:rsyslog.example.com:6969

    Before restarting Rsyslog, if it’s not already the case, make sure the remote system would accept your logs. You would need to load Rsyslog RELP input module. Make sure rsyslog-relp is installed, then add to your rsyslog configuration

    $ModLoad imrelp
    $InputRELPServerRun 6969

    You should be able to restart Rsyslog on both your Fail2ban instance and syslog proxy.

    Relaying messages to Logstash, we would be using JSON messages instead of traditional syslog formatting. To configure Rsyslog forwarding JSON messages to Logstash, we would use the following

    template(name=”jsonfmt” type=”list” option.json=”on”) {
        constant(value=”{“)
        constant(value=”\”@timestamp\”:\””) property(name=”timereported” dateFormat=”rfc3339″)
        constant(value=”\”,\”@version\”:\”1″)
        constant(value=”\”,\”message\”:\””) property(name=”msg”)
        constant(value=”\”,\”@fields.host\”:\””) property(name=”hostname”)
        constant(value=”\”,\”@fields.severity\”:\””) property(name=”syslogseverity-text”)
        constant(value=”\”,\”@fields.facility\”:\””) property(name=”syslogfacility-text”)
        constant(value=”\”,\”@fields.programname\”:\””) property(name=”programname”)
        constant(value=”\”,\”@fields.procid\”:\””) property(name=”procid”)
        constant(value=”\”}\n”)
      }
    local5.* @logstash.example.com:6968;jsonfmt

    Configuring Logstash processing Fail2ban logs, we would first need to define a few patterns. Create some /etc/logstash/patterns directory. In there, create a file fail2ban.conf, with the following content:

    F2B_DATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[ ]%{HOUR}:%{MINUTE}:%{SECOND}
    F2B_ACTION (\w+)\.(\w+)\[%{POSINT:pid}\]:
    F2B_JAIL \[(?\w+\-?\w+?)\]
    F2B_LEVEL (?\w+)\s+

    Then, configuring Logstash processing these messages, we would define an input dedicated to Fail2ban. Having tagged Fail2ban events, we will apply a Grok filter identifying blocked IPs and adding geo-location data. We’ll also include a sample output configuration, writing to ElasticSearch.

    input {
      udp {
        codec => json
        port => 6968
        tags => [ “firewall” ]
      }
    }

    filter {
      if “firewall” in [tags] {
        grok {
          patterns_dir => “/etc/logstash/patterns”
          match => {
            “message” => [
              “%{F2B_DATE:date} %{F2B_ACTION} %{F2B_LEVEL:level} %{F2B_JAIL:jail} %{WORD:action} %{IP:ip} %{GREEDYDATA:msg}?”,
              “%{F2B_DATE:date} %{F2B_ACTION} %{WORD:level} %{F2B_JAIL:jail} %{WORD:action} %{IP:ip}”
            ]
          }
        }
        geoip {
          source => “ip”
          target => “geoip”
          database => “/etc/logstash/GeoLite2-City.mmdb”
          add_field => [ “[geoip][coordinates]”, “%{[geoip][longitude]}” ]
          add_field => [ “[geoip][coordinates]”, “%{[geoip][latitude]}” ]
        }
        mutate {
          convert => [ “[geoip][coordinates]”, “float” ]
        }
      }
    }

    output {
      if “_check_logstash” not in [tags] and “_grokparsefailure” not in [tags] {
        elasticsearch {
          hosts => [ “elasticsearch1.example.com”, “elasticsearch2.example.com”, “elasticsearch3.example.com” ]
          index => “rsyslog-%{+YYYY.MM.dd}”
        }
      }
      if “_grokparsefailure” in [tags] {
        file { path => “/var/log/logstash/failed-%{+YYYY-MM-dd}.log” }
      }
    }

    Note that if I would have recommended using RELP inputs last year, running Logstash 2.3: as of Logstash 5 this plugin is no longer available. Hence I would recommend setting up some Rsyslog proxy on your Logstash instance, in charge of relaying RELP messages as UDP ones to Logstash, through your loopback.

    Moreover: assuming you would need to forward messages over a public or un-trusted network, then using Rsyslog RELP modules could be used with Stunnel encapsulation. Whereas running Debian, RELP output with TLS certificates does not seem to work as of today.

    That being said, before restarting Logstash, if you didn’t already, make sure to define a template setting geoip type to geo_point (otherwise shows as string and won’t be usable defining maps). Create some index.json file with the following:

    {
      “mappings”: {
        “_default_”: {
          “_all”: { “enabled”: true, “norms”: { “enabled”: false } },
          “dynamic_templates”: [
            { “template1”: { “mapping”: { “doc_values”: true, “ignore_above”: 1024, “index”: “not_analyzed”, “type”: “{dynamic_type}” }, “match”: “*” } }
          ],
          “properties”: {
            “@timestamp”: { “type”: “date” },
            “message”: { “type”: “string”, “index”: “analyzed” },
            “offset”: { “type”: “long”, “doc_values”: “true” },
            “geoip”: { “type” : “object”, “dynamic”: true, “properties” : { “location” : { “type” : “geo_point” } } }
          }
        }
      },
      “settings”: { “index.refresh_interval”: “5s” },
      “template”: “rsyslog-*”
    }

    kibana search

    kibana search

    Post your template to ElasticSearch

    root@logstash# curl -XPUT http://elasticsearch.example.com:9200/_template/rsyslog?pretty -d@index.json

    You may now restart Logstash. Check logs for potential errors, …

    kibana map

    kibana map

    Now to configure Kibana: start by searching for Fail2ban logs. Save your search, so it can be re-used later on.

    Then, in the Visualize panel, create a new Tile Map visualization, pick the search query you just saved. You’re being asked to select bucket type: click on Geo Coordinates. Insert a Label, click the Play icon to refresh the sample map on your right. Save your Visualization once you’re satisfied: these may be re-used defining Dashboards.

    BlueMind Best Practices

    Following up on a previous post introducing BlueMind, today we’ll focus on good practices (OpenDKIM, SPF, DMARC, ADSP, Spamassassin, firewalling) setting up their last available release (3.5.2) – and mail servers in general – while migrating my former setup (3.14).

    A first requirement to ease up creating mailboxes, and manipulating passwords during the migration process, would be for your user accounts and mailing lists to be imported from some LDAP backend.
    Assuming you do have a LDAP server, then you will need to create some service account in there, so BlueMind can bind. That account should have read access to your userPassword, pwdAccountLockedTime, entryUUID, entryDN, createTimestamp, modifyTimestamp and shadowLastChange attributes (assuming these do exist on your schema).
    If you also want to configure distribution lists from LDAP groups, then you may want to load the dyngroup schema.

    Another key to migrating your mail server would be to duplicate messages from one backend to another. Granted that your mailboxes already exist on both side, that some IMAP service is running on both sides, and that you can temporarily force your users password: then a tool you should consider is ImapSync. ImapSync can run for days, duplicate billions of mails, in thousands of folders, … Its simplicity is its strength.
    Ideally, we would setup our new mail server without moving our MXs yet. The first ImapSync run could take days to complete. From there, next runs would only duplicate new messages and should go way faster: you may consider re-routing new messages to your new servers, continuing to run ImapSync until your former server stops receiving messages.

    To give you an idea, my current setup involves less than 20 users, a little under 1.500.000 messages in about 400 folders, using around 30G of disk space. The initial ImapSync job ran for about 48 hours, the 3 next passes ran for less than one hour each.
    A few years back, I did something similar involving a lot more mailboxes: in such cases, having some frontal postfix, routing messages to either one of your former and newer backend depending on the mailbox you’re reaching could be a big help in progressively migrating users from one to the other.

     

    Now let’s dive into BlueMind configuration. We won’t cover the setup tasks as you’ll find these in BlueMind documentation.

    Assuming you managed to install BlueMind, and that your users base is stored in some LDAP, make sure to install BlueMind LDAP import plugin:

    apt-get install bm-plugin-admin-console-ldap-import bm-plugin-core-ldap-import

    Note that if you were used to previous BlueMind releases, you will now need to grant each user with their accesses to your BlueMind services, including the webmail. By default, a newly created user account may only access its settings.
    The management console would allow you to grant such permissions, there’s a lot of new stuff in this new release: multiple calendars per user, external calendars support, large files handling detached from webmail, …

     

    The first thing we will configure is some firewall.
    Note that Blue-Mind recommend to setup their service behind some kind of router, avoid exposing your instance directly to the Internet. Which is a good practice hosting pretty much everything anyway. Even though, you may want to setup some firewall.
    Note that having your firewall up and running may lead to BlueMind installation script failing to complete: make sure to keep it down until you’re done with BlueMind installer.

    On Debian, you may find the Firehol package to provide with an easy-to-configure firewall.
    The service you would need to open for public access being smtp (TCP:25), http (TCP:80), imap (TCP:143), https (TCP:443), smtps (TCP:465), submission (TCP:587) and imaps (TCP:993).
    Assuming firehol, your configuration would look like this:

    mgt_ips=”1.2.3.4/32 2.3.4.5/32″
    interface eth0 WAN
        protection strong
        server smtp accept
        server http accept
        server imap accept
        server https accept
        server smtps accept
        server submission accept
        server imaps accept
        server custom openssh “tcp/1234” default accept src “$mgt_ips”
        server custom munin “tcp/4949” default accept src “$mgt_ips”
        server custom nrpe “tcp/5666” default accept src “$mgt_ips”
        client all accept

    You may then restart your firewall. To be safe, you could restart BlueMind as well.

    Optionally, you may want to use something like Fail2ban, also available on Debian. You may not be able to track all abusive accesses, although you could lock out SMTP authentication brute-forces at the very least, which is still relevant.

    Note that BlueMind also provides with a plugin you could install from the packages cache extracted during BlueMind installation:

    apt-get install bm-plugin-core-password-bruteforce

     

    The second thing we would do is to install some valid certificate.  These days, services like LetsEncrypt would issue free x509 certificates.
    Still assuming Debian, a LetsEncrypt client is available in jessie-backports: certbot. This client would either need access to some directory served by your webserver, or would need to bind on your TCP port 443, so that LetsEncrypt may validate the common name you are requesting actually routes back to the server issuing this request. In the later case, we would do something like this:

    # certbot certonly –standalone –text –email me@example.com –agree-tos –domain mail.example.com –renew-by-default

    Having figured out how you’ll generate your certificate, we will now want to configure BlueMind services loading it in place of the self-signed one generated during installation:

    # cp -p /etc/ssl/certs/bm_cert.pem /root/bm_cert.pem.orig
    # cat /etc/letsencrypt/live/$CN/privkey.pem /etc/letsencrypt/live/$CN/fullchain.pem >/etc/ssl/certs/bm_cert.pem

    Additionally, you will want to edit postfix configuration using LetsEncrypt certificate chain. In /etc/postfix/main.cf, look for smtpd_tls_CAfile and set it to /etc/letsencrypt/live/$CN/chain.pem.

    You may now reload or restart postfix (smtps) and bm-nginx (both https & imaps).

    Note that LetsEncrypt certificates are valid for 3 months. You’ll probably want to install some cron job renewing your certificate, updating /etc/ssl/certs/bm_cert.pem then reloading postfix and bm-nginx.

     

    The next thing we would configure is OpenDKIM signing our outbound messages and validating signatures of inbound messages.
    Debian has some opendkim package embedding everything you’ll need on a mail relay. Generating keys, you will also need to install opendkim-tools.

    Create some directory layout and your keys:

    # cd /etc
    # mkdir dkim.d
    # cd dkim.d
    # mkdir keys keys/example1.com keys/example2.com keys/exampleN.com
    # for d in example1 example2 exampleN; do \
        ( cd keys/$d.com; opendkim-genkey -r -d $d.com ); done
    # chmod 0640 */default.private
    # chown root:opendkim */default.private

    In each of /etc/dkim.d/keys subdir, you will find a default.txt file that contains the DNS record you should add to the corresponding zone. Its content would look like the following:

    default._domainkey IN TXT “v=DKIM1; k=rsa; p=PUBLICKEYDATA”

    You should have these DNS records ready prior to having configured Postfix signing your messages.
    Having generated our keys, we still need to configure OpenDKIM signing messages. On Debian, the main configuration is /etc/opendkim.conf, and should contain something like this:

    Syslog yes
    UMask 002
    OversignHeaders From
    KeyTable /etc/dkim.d/KeyTable
    SigningTable /etc/dkim.d/SigningTable
    ExternalIgnoreList /etc/dkim.d/TrustedHosts
    InternalHosts /etc/dkim.d/TrustedHosts

    You would need to create the 3 files we’re referring to in there, the first one being /etc/dkim.d/KeyTable:

    default._domainkey.example1.com example1.com:default:/etc/dkim.d/keys/example1.com/default.private
    default._domainkey.example2.com example2.com:default:/etc/dkim.d/keys/example2.com/default.private
    default._domainkey.exampleN.com exampleN.com:default:/etc/dkim.d/keys/exampleN.com/default.private

    The second one is /etc/dkim.d/SigningTable and would contain something like this:

    example1.com default._domainkey.example1.com
    example2.com default._domainkey.example2.com
    exampleN.com default._domainkey.exampleN.com

    And the last one, /etc/dkim.d/TrustedHosts would contain a list of the subnets we should sign messages for, such as

    1.2.3.4/32
    2.3.4.5/32
    127.0.0.1
    localhost

    Now, let’s ensure OpenDKIM would start on boot, editing /etc/default/opendkim with following:

    DAEMON_OPTS=
    SOCKET=”inet:12345@localhost”

    Having started OpenDKIM and made sure the service is properly listening on TCP:12345, you may now configure Postfix relaying its messages to OpenDKIM. Edit your /etc/postfix/main.cf adding the following:

    milter_default_action = accept
    smtpd_milters = inet:localhost:12345
    non_smtpd_milters = inet:localhost:12345

    Restart Postfix, make sure mail delivery works. Assuming you can already send outbound messages, make sure your DKIM signature appears to be valid for other mail providers (an obvious one being gmail).

     

    Next, we’ll configure SPF validation of inbound messages.
    On Debian, you would need to install postfix-policyd-spf-perl.

    Let’s edit /etc/postfix/master.cf, adding a service validating inbound messages matches sender’s domain SPF policy:

    spfcheck unix – n n – 0 spawn
        user=policyd-spf argv=/usr/sbin/postfix-policyd-spf-perl

    Next, edit /etc/postfix/main.cf, look for smtpd_recipient_restrictions. The last directive should be a reject_unauth_destination, and should precede the policy check we want to add:

    smtpd_recipient_restrictions=permit_sasl_authenticated,
        permit_mynetworks,reject_unauth_destination,
        check_policy_service unix:private/policyd-spf

    Restart Postfix, make sure you can still properly receive messages. Checked messages should now include some Received-SPF header.

     

    Finally, we’ll configure SPAM checks and Spamassassin database training.
    On Debian, you’ll need to install spamassassin.

    Let’s edit /etc/spamassassin/local.cf defining a couple trusted IPs, and configuring Spamassassin to rewrite the subject for messages detected as SPAM:

    rewrite_header Subject [ SPAM _SCORE_ ]
    trusted_networks 1.2.3.4/32 2.3.4.5/32
    score ALL_TRUSTED -5
    required_score 2.0
    use_bayes 1
    bayes_auto_learn 1
    bayes_path /root/.spamassassin/bayes
    bayes_ignore_header X-Spam-Status
    bayes_ignore_header X-Spam-Flag
    ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
    shortcircuit ALL_TRUSTED on
    shortcircuit BAYES_99 spam
    shortcircuit BAYES_00 ham
    endif

    Configure Spamassassin service defaults in /etc/default/spamassassin:

    CRON=1
    ENABLED=1
    NICE=”–nicelevel 15″
    OPTIONS=”–create-prefs –max-children 5 -H /var/log/spamassassin -s /var/log/spamassassin/spamd.log”
    PIDFILE=/var/run/spamd.pid

    Make sure /root/.spamassassin and /var/log/spamassassin both exist.

    Now let’s configure Spamassassin to lear from BlueMind SPAM folders content, create or edit /etc/spamassassin/sa-learn-cyrus.conf with the following content:

    [global]
    tmp_dir = /tmp
    lock_file = /var/lock/sa-learn-cyrus.lock
    verbose = 1
    simulate = no
    log_with_tag = yes

    [mailbox]
    include_list = ”
    include_regexp = ‘.*’
    exclude_list = ”
    exclude_regexp = ”
    spam_folder = ‘Junk’
    ham_folder = ‘Inbox’
    remove_spam = yes
    remove_ham = no

    [sa]
    debug = no
    site_config_path = /etc/spamassassin
    learn_cmd = /usr/bin/sa-learn
    bayes_storage = berkely
    prefs_file = /etc/spamassassin/local.cf
    fix_db_permissions = yes
    user = mail
    group = mail
    sync_once = yes
    virtual_config_dir = ”

    [imap]
    base_dir = /var/spool/cyrus/example_com/domain/e/example.com/
    initial_letter = yes
    domains = ”
    unixhierarchysep = no
    purge_cmd = /usr/lib/cyrus/bin/ipurge
    user = cyrus

    Look out for sa-learn-cyrus script. Note that Debian provides with a package with that name, that would pull cyrus as a dependency – which is definitely something you want on a BlueMind server.
    Run this script to train Spamassassin from the messages in your Inboxes and Junk folders. Eventually, you could want to install some cron job.

    Start or restart Spamassassin service. Now, let’s configure Postfix piping its messages to Spamassassin. Edit /etc/postfix/master.cf, add the following service:

    spamassassin unix – n n – – pipe
        user=debian-spamd argv=/usr/bin/spamc -f -e
        /usr/sbin/sendmail -oi -f ${sender} ${recipient}

    Still in /etc/postfix/master.cf, locate the smtp service, and add a content_filter option pointing to our spamassassin service:

    smtp      inet  n       –       n       –       –       smtpd
        -o content_filter=spamassassin

    Restart Postfix. Sending or receiving messages, you should read about spamd in /var/log/mail.log. Moreover, a X-Spam-Checker-Version header should show in your messages.

     

    Prior to migrating your messages, make sure to mount /var/spool/cyrus on a separate device, and /var/backups/bluemind on some NFS share, LUN device, sshfs, …. something remote, ideally.
    Your /tmp may be mounted from tmpfs, and could use the nosuid and nodev options – although you can not set noexec.

    Assuming you have some monitoring system running: make sure to keep an eye on mail queues, smtp service availability or disk space among others, …

     

    Being done migrating your setup, the last touch would be to set proper SPF, ADSP and DMARC policies (DNS records).

    Your SPF record defines which IPs may issue mail on behalf of your domain. Usually, you just want to allow your MXs (mx). Maybe you’ll want to trust some additional record, … And deny everyone else (-all). Final record could look like this:

    @ IN TXT “v=spf1 mx a:non-mx-sender.example.com -all”

    Having defined your SPF policy, and assuming you properly configured DKIM signing, while corresponding public key is published in your DNS, then you may consider defining some DMARC policy as well.

    @ IN TXT v=DMARC1; p=quarantine; pct=100; rua=mailto:postmaster@example.com

    And ADSP:

    _adsp._domainkey IN TXT “dkim=all;”

    … and that’s pretty much it. The rest’s up to you, and would probably be doable from BlueMind administration console.