Static Security Analysis of Container Images with CoreOS Clair

Container security is (or should be) a concern to anyone running software on Docker Containers. Gone are the days when running random Images found on the internet was common place. Security guides for Containers are common now: examples from Microsoft and others can be found easily online.

The two leading Container Orchestrators also offer their own security guides: Kubernetes Security Best Practices and Docker security.

Container Image Origin

One of the single biggest factors in Container security is determined by the origin of container Images:

  1. It is recommended to run your own private Registry to distribute Images
  2. It is recommended to scan these Images against known vulnerabilities.

Running a private Registry is easy these days (Azure Container Registry for instance).

I will concentrate on the scanning of Images in the remainder of this post and show how to look for common vulnerabilities using Core OS Clair. Clair is probably the most advanced non commercial scanning solution for Containers at the moment, though it requires some elbow grease to run this way. It’s important to note that the GUI and Enterprise features are not free and are sold under the Quay brand.

As security scanning is recommended as part of the build process through your favorite CI/CD pipeline, we will see how to configure Visual Studio Team Services (VSTS) to leverage Clair.

Installing CoreOS Clair

First we are going to run Clair in minikube for the sake of experimenting. I have used Fedora 26 and Ubuntu. I will assume you have minikube, kubectl and docker installed (follow the respective links to install each piece of software) and that you have initiated a minikube cluster by issuing the “minikube start” command.

Clair is distributed through a docker image or you can also compile it yourself by cloning the following Github repository: https://github.com/coreos/clair.

In any case, we will run the following commands to clone the repository, and make sure we are on the release 2.0 branch to benefit from the latest features (tested on Fedora 26):

~/Documents/github|⇒ git clone https://github.com/coreos/clair
Cloning into 'clair'...
remote: Counting objects: 8445, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 8445 (delta 0), reused 2 (delta 0), pack-reused 8440
Receiving objects: 100% (8445/8445), 20.25 MiB | 2.96 MiB/s, done.
Resolving deltas: 100% (3206/3206), done.
Checking out files: 100% (2630/2630), done.

rafb@overdrive:~/Documents/github|⇒ cd clair
rafb@overdrive:~/Documents/github/clair|master
⇒ git fetch
rafb@overdrive:~/Documents/github/clair|master
⇒ git checkout -b release-2.0 origin/release-2.0
Branch release-2.0 set up to track remote branch release-2.0 from origin.
Switched to a new branch 'release-2.0'

The Clair repository comes with a Kubernetes deployment found in the contrib/k8s subdirectory as shown below. It’s the only thing we are after in the repository as we will run the Container Image distributed by Quay:

rafb@overdrive:~/Documents/github/clair|release-2.0
⇒ ls -l contrib/k8s
total 8
-rw-r--r-- 1 rafb staff 1443 Aug 15 14:18 clair-kubernetes.yaml
-rw-r--r-- 1 rafb staff 2781 Aug 15 14:18 config.yaml

We will modify these two files slightly to run Clair version 2.0 (for some reason the github master branch carries an older version of the configuration file syntax – as highlighted in this github issue).

In the config.yaml, we will change the postgresql source from:

source: postgres://postgres:password@postgres:5432/postgres?sslmode=disable

to

source: host=postgres port=5432 user=postgres password=password sslmode=disable

In config.yaml, we will change the version of the Clair image from latest to 2.0.1:

From:

image: quay.io/coreos/clair

to

image: quay.io/coreos/clair:v2.0.1

Once these changes have been made, we can deploy Clair to our minikube cluster by running those two commands back to back:

kubectl create secret generic clairsecret --from-file=./config.yaml 
kubectl create -f clair-kubernetes.yaml

By looking at the startup logs for Clair, we can see it fetches a vulnerability list at startup time:

[rbigeard@flanger ~]$ kubectl get pods NAME READY STATUS RESTARTS AGE 
clair-postgres-l3vmn 1/1 Running 1 7d 
clair-snmp2 1/1 Running 4 7d 
[rbigeard@flanger ~]$ kubectl logs clair-snmp2 
...
{"Event":"fetching vulnerability updates","Level":"info","Location":"updater.go:213","Time":"2017-08-14 06:37:33.069829"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"ubuntu.go:88","Time":"2017-08-14 06:37:33.069960","package":"Ubuntu"} 
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"oracle.go:119","Time":"2017-08-14 06:37:33.092898","package":"Oracle Linux"} 
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"rhel.go:92","Time":"2017-08-14 06:37:33.094731","package":"RHEL"}
{"Event":"Start fetching vulnerabilities","Level":"info","Location":"alpine.go:52","Time":"2017-08-14 06:37:33.097375","package":"Alpine"}

Scanning Images through Clair Integrations

Clair is just a backend and we therefore need a frontend to “feed” Images to it. There are a number of frontends listed on this page. They range from full Enterprise-ready GUI frontends to simple command line utilities.

I have chosen to use “klar” for this post. It is a simple command line tool that can be easily integrated into a CI/CD pipeline (more on this in the next section). To install klar, you can compile it yourself or download a release.

Once installed, it’s very easy to use and parameters are passed using environment variables. In the following example, CLAIR_OUTPUT is set to “High” so that we only see the most dangerous vulnerabilities. CLAIR_ADDRESS is the address of Clair running on my minikube cluster.

Note that since I am pulling an image from an Azure Container Registry instance and I have specified a DOCKER_USER and DOCKER_PASSWORD variable in my environment.

# CLAIR_OUTPUT=High CLAIR_ADDR=http://192.168.99.100:30060 \
klar-1.4.1-linux-amd64 romainregistry1.azurecr.io/samples/nginx

Analysing 3 layers 
Found 26 vulnerabilities 
CVE-2017-8804: [High]  
The xdr_bytes and xdr_string functions in the GNU C Library (aka glibc or libc6) 2.25 mishandle failures of buffer deserialization, which allows remote attackers to cause a denial of service (virtual memory allocation, or memory consumption if an overcommit setting is not used) via a crafted UDP packet to port 111, a related issue to CVE-2017-8779. 
https://security-tracker.debian.org/tracker/CVE-2017-8804 
----------------------------------------- 
CVE-2017-10685: [High]  
In ncurses 6.0, there is a format string vulnerability in the fmt_entry function. A crafted input will lead to a remote arbitrary code execution attack. 
https://security-tracker.debian.org/tracker/CVE-2017-10685 
----------------------------------------- 
CVE-2017-10684: [High]  
In ncurses 6.0, there is a stack-based buffer overflow in the fmt_entry function. A crafted input will lead to a remote arbitrary code execution attack. 
https://security-tracker.debian.org/tracker/CVE-2017-10684 
----------------------------------------- 
CVE-2016-2779: [High]  
runuser in util-linux allows local users to escape to the parent session via a crafted TIOCSTI ioctl call, which pushes characters to the terminal's input buffer. 
https://security-tracker.debian.org/tracker/CVE-2016-2779 
----------------------------------------- 
Unknown: 2 
Negligible: 15 
Low: 1 
Medium: 4 
High: 4

So Clair is showing us the four “High” level common vulnerabilities found in the nginx image that I pulled from Docker Hub. At times of writing, this is consistent with the details listed on docker hub. It’s not necessarily a deal breaker as those vulnerabilities are only potentially exploitable by local users on the Container host which mean we would need to protect the VMs that are running Containers well!

Automating the Scanning of Images in Azure using a CI/CD pipeline

As a proof-of-concept, I created a “vulnerability scanning” Task in a build pipeline in VSTS.

Conceptually, the chain is as follows:

Container image scanning VSTS pipeline

I created an Ubuntu Linux VM and built my own VSTS agent following published instructions after which I installed klar.

I then built a Kubernetes cluster in Azure Container Service (ACS) (see my previous post on the subject which includes a script to automate the deployment of Kubernetes on ACS), and deployed Clair to it, as shown in the previous section.

Little gotcha here: my Linux VSTS agent and my Kubernetes cluster in ACS ran in two different VNets so I had to enable VNet peering between them.

Once those elements are in place, we can create a git repo with a shell script that calls klar and a build process in VSTS with a task that will execute the script in question:

Scanning Task in a VSTS Build

The content of scan.sh is very simple (This would have to be improved for a production environment obviously, but you get the idea):

#!/bin/bash
CLAIR_ADDR=http://X.X.X.X:30060 klar Ubuntu

Once we run this task in VSTS, we get the list of vulnerabilities in the output which can be used to “break” the build based on certain vulnerabilities being discovered.

Build output view in VSTS

Summary

Hopefully you have picked up some ideas around how you can ensure Container Images you run in your environments are secure, or at least you know what potential issues you are having to mitigate, and that a build task similar to the one described here could very well be part of a broader build process you use to build Containers Images from scratch.

Running Containers on Azure

Running Containers in public cloud environments brings advantages beyond the realm of “fat” virtual machines: easy deployments through a registry of Images, better use of resources, orchestration are but a few examples.

Azure is embracing containers in a big way (Brendan Burns, one of the primary instigators of Kubernetes while at Google, joined Microsoft last year which might have contributed to it!)

Running Containers nowadays is almost always synonymous with running an orchestrator which allows for automatic deployments of multi-Container workloads.

Here we will explore the use of Kubernetes and Docker Swarm which are arguably the two most popular orchestration frameworks for Containers at the moment. Please note that Mesos is also supported on Azure but Mesos is an entirely different beast which goes beyond the realm of just Containers.

This post is not about comparing the merits of Docker Swarm and Kubernetes, but rather a practical introduction to running both on Azure, as of August 2017.

VMs vs ACS vs Container Instances

When it comes to running containers on Azure, you have the choice of running them on Virtual Machines you create yourself or via the Azure Container Service which takes care of creating the underlying infrastructure for you.

We will explore both ways of doing things with a Kubernetes example running on ACS and a Docker Swarm example running on VMs created by Docker Machine. Note that at times of writing, ACS does not support the relatively new Swarm mode of Docker, but things move extremely fast in the Container space… for instance, Azure Container Instances are a brand new service allowing users to run Containers directly and billed on a per second basis.

In any case, both Docker Swarm and Kubernetes offer a powerful way of managing the lifecycle of Containerised application environments alongside storage and network services.

Kubernetes

Kubernetes is a one-stop solution for running a Container cluster and deploying applications to said cluster.

Although the WordPress example found in the official Kubernetes documentation is not specifically geared at Azure, it will run easily on it thanks to the level of standardisation Kubernetes (aka K8s) has achieved.

This example deploys a MySQL instance Container with a persistent volume to store data and a separate Container which combines WordPress and Apache, sporting its own persistent volume.

The PowerShell script below leverages the “az acs create” Azure CLI 2.0 command to spin up a two node Kubernetes cluster. Yes, one command is all it takes to create a K8s cluste! If you need ten agent nodes, just change the “–agent-count” value to 10.

It is invoked as follows:

The azureSshKey parameter point to a private SSH key (the corresponding public key must exist as well) and kubeWordPressLocation is the path to the git clone of the Kubernetes WordPress example.

Note that you need to have ssh connectivity to the Kubernetes Master i.e. TCP port 22.

Following the creation of the cluster, the script leverages kubectl (“az acs kubernetes install-cli” to install it) to deploy the aforementioned WordPress example.

“The” script:

Note that if you specify an OMS workspace ID and key, the script will install the OMS agent on the underlying VMs to monitor the infrastructure and the Containers (more on this later).

Once the example has been deployed, we can check the status of the cluster using the following command (be patient as it takes nearly 20 minutes for the service to be ready):

Screen Shot 2017-07-28 at 15.31.23

Note the value for EXTERNAL-IP (here 13.82.180.10). By entering this value in the location bar of your browser after both containers are running gives you access to the familiar WordPress install screen:

Screen Shot 2017-07-28 at 16.08.36

This simple K8 visualizer written a while back by Brendan Burns shows the various containers running on the cluster:

Screen Shot 2017-07-28 at 15.36.43.png

Docker Swarm

Docker Swarm is the “other” Container orchestrator, delivered by Docker themselves. Since Docker version 1.12, the so called “Swarm mode” does not require any discovery service like consul as it is handled by the Swarm itself. As mentioned in the introduction, Azure Container Service does not support Swarm mode yet so we will run our Swarm on VMs created by Docker Machine.

It is now possible to use Docker Compose against Docker Swarm in Swarm mode in order to deploy complex applications making the trifecta Swarm/Machine/Compose a complete solution competing with Kubernetes.

swarm-diagram

The new-ish Docker Swarm mode handles service discovery itself

As with the Kubernetes example I have created a script which automates the steps to create a Swarm then deploys a simple nginx continer on it.

As the Swarm mode is not supported by Azure ACS yet, I have leveraged Docker Machine, the docker sanctioned tool to create “docker-ready” VMs in all kinds of public or private clouds.

The following PowerShell script (easily translatable to a Linux flavour of shell) leverages the Azure CLI 2.0, docker machine and docker to create a Swarm and deploy an nginx Container with an Azure load balancer in front. Docker machine and docker were installed on my Windows 10 desktop using chocolatey.

Kubernetes vs Swarm

Compared to Kubernetes on ACS, Docker Swarm takes a bit more effort and is not as compact but would probably be more portable as it does not leverage specific Azure capabilities, save for the load balancer. Since Brendan Burns joinded Microsoft last year, it is understandable that Kubernetes is seeing additional focus.

It is a fairly recent development but deploying multi-Container applications on a Swarm can be done via a docker stack deploy using a docker compose file version 3 (compose files for older revisions need some work as seen here).

Azure Registry Service

I will explore this in more detail a separate post, but a post about docker Containers would not be complete without some thoughts about where the source of Images for your deployments live (namely the registry).

Azure offers a Container Registry service (ACR) to store and retrieve Images which is an easy option to use that well integrated with the rest of Azure.

It might be obvious to point out, but deploying Containers based on Images downloaded from the internet can be a security hazard, so creating and vetting your own Images is a must-have in enterprise environments, hence the criticality of running your own Registry service.

Monitoring containers with OMS

My last interesting bit for Containers on Azure is around monitoring. If you deploy the Microsoft Operations Management Suite (OMS) agent on the underlying VMs running Containers and have the Docker marketplace extension for Azure as well as an OMS account, it is very easy to get useful monitoring information. It is also possible to deploy OMS as a Container but I found that I wasn’t getting metrics for the underlying VM that way.

Screen Shot 2017-07-28 at 16.05.30.png

This allows not only to grab performance metrics from containers and the underlying infrastructure but also to grab logs from applications, here the web logs for instance:

Screen Shot 2017-07-28 at 16.12.23

I hope you found this post interesting and I am looking forward to the next one on this topic.

Combining Ansible and AWS Cloudformation for Windows Provisioning

Imagine an agentless “robot” user that you can program to configure servers, network equipment, public cloud resources, deploy applications, etc.

Ansible is an IT automation solution which was acquired by RedHat in 2015. Already popular before the RedHat takeover, Ansible is becoming and more more common in IT organisations.

Originally targeted at Linux hosts as a target for automated configuration management and orchestration, Ansible acquired in version 2.0 capabilities to automate network devices.

And in version 2.3, which has been just released, Ansible’s Windows friendly features have been seriously augmented with domain related modules and an experimental “runas” feature.

Ansible Windows 2.jpg

I would like to show in this post how Ansible can be used as the glue between public cloud provisioning features such as AWS cloudformation and Virtual Machines, with Windows in mind  for this particular example. This is a quick introduction which hopefully showcases some of the main features of Ansible with regards to configuration management (orchestration is a totally different kettle of fish).

Ansible Features

Ansible is agentless. It connects to Linux/UNIX hosts through ssh and to Windows hosts through WinRM. As such, it acts as a regular user. Windows authentication comprises of local accounts, Kerberos, CredSSP, etc. – see the Ansible Windows support page for more information.

Ansible’s syntax is YAML based and the basic element of an Ansible play is a task. A task invokes a module which usually performs a single change, for instance install packages, restart a service, modify firewall rules, etc. Tasks can be grouped in roles as a form of logical grouping, i.e. every task related to setting up a database server can be put together under a “database” role.

There are plenty of modules, the list can be found here: http://docs.ansible.com/ansible/modules.html.

A collection of plays is called a playbook. Ansible also comprises of a powerful templating system based on Jinja2. It allows to programmatically create complex configuration files that can then be copied onto the target nodes.

Configuration Management and Idempotency

One of the key advantages of configuration management frameworks such as Ansible (and Puppet, Chef, Saltstack, etc.) resides in idempotency. This means that no matter how many times you run the same code against a bunch of targets, you will always end up in the same state.

For instance, if an Ansible playbook installs a package and restarts a service, it might make changes the first time it runs but the second time you run it, it will check first if the package is already installed and the service started before doing anything. It makes it possible to regularly run the same code against targets to ensure state, something which can be hard to do with regular scripting languages.

A word of caution though: it is possible to break idempotency. A classic example involves running a non idempotent shell commands as part of an Ansible run. This is why it is always encouraged to use modules instead of ad-hoc shell commands in any configuration management effort.

Provisioning and Configuring a Windows Server

In a nutshell, the following Ansible code executes a first play comprising of three tasks which does the following:

  1. Create a cloudformation stack based on the windows_template.yml cloudformation template (full code here). This simple stack spins up an Ansible ready Windows 2012 R2 server.
  2. Retrieve the details of the newly created Windows 2012R2 server using the ec2_remote_facts module
  3. Retrieve the password of the Windows VM as Windows VMs in AWS are created with a new random password every time
  4. Update the Ansible inventory with the details of the new server.

The second play applies the role “webserver” to the newly created Windows 12R2 server. This role enables IIS using the win_feature Ansible module.

Ansible Code:

---
- hosts: localhost
  vars:
    - aws_region: "ap-southeast-2"
  tasks:
    - name:  launch Windows Ansible Test cloudformation stack
      cloudformation:
        stack_name: "AnsibleWindowsTest1"
        region: "{{ aws_region }}"
        state: "present"
        template: "windows_template.yml"
    - ec2_remote_facts:
        filters:
          instance-state-name: running
          "tag:Name": WindowsAnsibleTest
        region: "{{ aws_region }}"
      register: instance_details
    - name: getting windows password
      ec2_win_password:
        instance_id: "{{ instance_details.instances[0].id }}"
        key_file: "/media/psf/Home/Documents/Kloud/MyWindowsKey.pem"
        wait: yes
        region: "{{ aws_region }}"
      register: win_password_output
    - name: adding host to inventory
      add_host:
        name: windows_test_host
        ansible_host: "{{ instance_details.instances[0].public_dns_name }}"
        ansible_password: "{{ win_password_output.win_password }}"
        groups: ec2_windows
        ansible_winrm_server_cert_validation: ignore
        ansible_connection: 'winrm'
        ansible_user: Administrator
        ansible_port: 5986

- hosts: windows_test_host
  roles:
    - webserver

We are using WindRM over SSL here, on the standard port, 5986. A word about WinRM SSL certificates: the "ansible_winrm_server_cert_validation: ignore" setting is needed if Windows self-signed certificates are being used, this is a python related limitation. For production environments, creating your own certificates is a better alternative, find more about this in the documentation. For a production environment which uses AD, I also strongly advocate the use of Kerberos for user level authentication, which I have used successfully in customer environments.

Ansible Simple IIS Role

Here is the simple Ansible IIS role mentioned above:

---
  - name: Install IIS
    win_feature:
      name: "Web-Server"
      state: present
      restart: yes
      include_sub_features: yes
      include_management_tools: yes

Executing the Ansible Playbook

Ansible Core refers to a base installation of Ansible on a Linux/UNIX/MacOS machine.  It is free and open source. There is no Ansible version for Windows but it can run in the Windows 10 Linux subsystem, even though it is not fully supported for production workloads. See the installation documentation for the various ways to install Ansible Core.

For our example, Ansible interacts with AWS in order to create a cloudformation stack. It leverages authentication that you might have setup for the AWS cli. As such, it leverages the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables that you have setup in order to make programmatic calls to AWS (see the AWS cli documentation).

The last piece of the puzzle is pywinrm which can be installed by issuing "sudo -H pip install "pywinrm>=0.2.2" on a Linux/UNIX host. This will allow Ansible to interact over WinRM with Windows targets.

Once setup, the following command will trigger the cloudformation stack creation and will configure the newly created Windows 12R2 VM:

ansible-playbook -e "windows_aws_key_file=~/MyWindowsKey.pem" -i ec2.py provision.yml

ec2.py is a dynamic inventory script for AWS. It will query AWS and report back a JSON array of all AWS resources. The windows_aws_key_file extra variable is the path to your AWS Windows Key file. Not that the Windows key name is also found in the cloudformation template (windows_template.yml).

The Ansible run will show the output of every task, after which you will have a newly created Windows Server 2012R2 server created with IIS turned on. It is easy to imagine how this can be extended to automatically create complex environments.

Screen Shot 2017-06-13 at 16.21.58

Ansible in a DevOps landscape

Ansible Core can be part of a larger DevOps landscape for configuring VMs or deploying applications, combined with CI/CD software such as Jenkins or Bamboo.

Ansible Tower, a paid for product, is a scheduler for Ansible playbook runs with clustering features and a recording of the history of playbook executions. Its REST api allows for it to be easily integrated with a CI/CD systems to synchronise software builds with the automated deployment of virtual infrastructure and applications.

Code

The entire code for this example can be found here: https://github.com/rbigeard/ansible_clouformation_windows. Don't forget to specify the location of your AWS Windows keys in group_vars/all.yml