Ten tips for building a vulnerability management pipeline
The Log4j vulnerability was a good reminder that securing cloud-native applications start with using safe container images free of critical vulnerabilities. When the vulnerability was made public, many security teams struggled to get a quick and comprehensive list of images running in their environments, including the packages and libraries included.
A good container image pipeline not only reduces the number of vulnerabilities present in running containers, but it also helps companies to quickly respond to such emergencies. How can you build that perfect pipeline to ensure safe containers? In this post we’ll go through the 10 best practices that we’ve seen mature organizations deploy as they secure their development pipeline.
#1: Start with a well-defined and known-good policy
To create your policy, you need to first decide what is acceptable in terms of vulnerabilities:
- Should all fixable vulnerabilities be fixed, or just vulnerabilities of medium, high and critical vulnerabilities?
- How will you manage vulnerabilities that don’t have a fix available?
- What is the process for mitigating such vulnerabilities, to assess their potential impact and to track them until they can be remediated?
- What is your SLA to fix vulnerabilities in active containers?
Additionally, the policy should be fully defined and known to all employees involved in building, running and fixing images like developers, administrators, and security teams. This policy must then be validated at every point of the build pipeline. This approach – commonly referred to as DevSecOps – allows companies to reduce the cost of fixing security issues by addressing them earlier in the build process, and to ensure that there is no gap in coverage.
It is also important to go beyond just checking for vulnerabilities. Your policy should also include best practices to create reproducible images and immutable containers, such as:
- Do not update the list of packages automatically (apt-get update or yum update), and instead use a updated base image hosted in your registry
- Define USER as non-root
- Do not store secrets as environment variable
#2: Enforce policy at multiple points
Security and development teams should leverage tools that maintain a centralized policy that is consistently applied at every step, including:
- Development of images: let your developers test their images locally and fix most of the vulnerabilities before they check in their code.
- Code repository: automate the test of Dockerfile and other artifacts to ensure all code checked in is safe. Require reviews to approve unfixable vulnerabilities or exceptions.
- Build time / Continuous Integration (CI): automatically scan your images for vulnerabilities as they are built. Reject unsafe images or require an exception process.
- Publication to a registry: eventually, all of your images must be stored in a container registry. Integrate each registry with a vulnerability scanner to ensure a complete coverage of all images produced.
- Deployment to Kubernetes: the last check before a container image gets used is to leverage a Kubernetes admission controller to ensure images have been scanned before they are deployed.
- Runtime: create an up-to-date inventory of all images running. Continuously look for new vulnerabilities in running images.
#3: Make vulnerability management for developers simple
Developers create a Dockerfile, or similar artifacts, that defines the final container image. If a vulnerability is found later on, it will likely come back to them to fix their Dockerfile and recreate a safer image. Rather than relying on an external team or process to validate the image, they need to have the right tools to test their image locally:
- A tool that can run the image through the company policy and tell them what vulnerabilities have to be fixed
- A tool that gives them all the information needed to quickly fix a vulnerability: where the vulnerability was introduced (base image, package installation, etc.), the version to upgrade to, etc.
- A tool that gives them all the information they need without having to log to an external application – see the Lacework plugin for Visual Studio Code as an example.
- A tool that produces a complete report they can attach to request an exception, or to get a review by a security team, when vulnerabilities cannot be fixed
Making it easy for your developers to find and fix vulnerabilities will make the overall pipeline a lot more efficient. Finding vulnerabilities later on, and having to restart the process to generate a new image, is more expensive.
#4: Integrate code repository
In the spirit of avoiding any single point of failure, you can automatically do the same tests as the developers for every code check in. Even if a developer forgot to validate their image locally, or did not follow the policy, it will be caught quickly and will prevent other teams from picking up a vulnerable image.
#5: Build & Continuous Integration (CI)
Many companies have a Continuous Integration (CI) system in place that continuously builds and tests new code, including container images. The policy should be validated again in that step. A policy violation should fail the build and prevent the image from being shipped to a container registry. Lacework has examples for integrating vulnerability scanning with several popular CI tools.
#6: Scan your registries
Images have to be pulled from a container registry to be used in Kubernetes or other container orchestration tools. To ensure that all of your images are scanned, even if they somehow skipped your CI pipeline, you should automatically scan new images added to your registries using registry notification or auto-polling. An alert should be raised if an image was not scanned previously or if new policy violations are found.
#7: Do your final check in Kubernetes
You have one more chance to check the image just before it gets deployed to Kubernetes. An admission controller is a Kubernetes component that can get called when a workload is created or updated. The admission controller can check whether the image was scanned before, whether the policy was validated and whether new vulnerabilities have been found since the last scan. The admission controller can prevent an image from being deployed, or can raise an alert to be investigated later.
#8: Recognize the importance of a good inventory & SBOM
It is important to keep track of all the scans being performed in the pipeline. If a policy validation fails late in the pipeline, it’s likely an image did not follow the normal process, or it was allowed to progress with issues. Being able to track the different scans through the pipeline is critical when investigating gaps in coverage.
It’s also important to keep a good inventory of these images, whether they are in a registry or running in Kubernetes. If a new vulnerability is discovered, it is important to be able to quickly check whether a vulnerable image is running in a container, or waiting to be deployed in the registry. The inventory should not only contain a report of all vulnerabilities found, but the complete list of packages, libraries and important attributes (layers, user, entry point, etc.) that allows a quick but complete reevaluation of the image without requiring a full scan of the actual image. This list of content is called SBOM (Software Bill of Materials) and it’s critical to investigate any kind of software supply chain attack.
Scanning of images should not stop once they are deployed. As seen with Log4j, new vulnerabilities in existing software are found everyday. The SBOM should be constantly checked against the latest list of vulnerabilities. Actions should be taken if new vulnerabilities are found or if the severity of existing vulnerabilities have changed. Images may also sit in a registry for several days before they are deployed. It is important to be alerted as soon as possible when new vulnerabilities are found. Lacework automatically scans all active images and can rescan registries daily.
#9: Don’t forget OSS images
Many companies have built a secure pipeline for the image they produce. But they often forget about open-source images (Prometheus, Istio, etc.) that get pulled directly from third-party container registries (Docker Hub, Quay, etc.), thereby skipping the entire pipeline. The only checkpoint is the admission controller, as the image is getting pulled.
The best practice is to treat third-party images the same as your own:
- Pull the image into your CI tool for the initial scan.
- Ship the image, if it passes the validation tests, to your own registry.
- Restrict Kubernetes to pull images from your own registry only.
This also helps when responding to new vulnerabilities like Log4j in OSS images, but also prevents rogue maintainers or hijacked accounts to push malicious images to your organization.
Another option is to mirror the OSS image repository into your own registry, and perform periodic vulnerability scans as described earlier in the “Scan your registries” section.
If neither of these approaches are feasible for your current infrastructure, consider performing ad-hoc vulnerability scans, using the Lacework inline scanner for instance.
#10: Implement a solution – like Lacework – to provide visibility and keep you on track
Security and development teams should implement tools that help them easily and consistently apply these centralized policies at every step. Tools like Lacework can help them ensure that With this pipeline, images are validated at every point of their lifecycle and make sure they cannot. They should not be able to progress to the next step if vulnerabilities are found, or if exceptions have not been approved. The number of validation done may appear to be redundant, but it avoids relying entirely on one team or point of failure.
How Lacework enforces policies at each step of the process, from the creation of images to runtime
It’s important to build a pipeline with lots of redundancy to avoid any gaps, and to avoid relying on the goodwill of a single team. This pipeline can be built progressively over time, starting from both ends – involve developers early on to produce better images in the first place, scan registries to increase coverage and deploy the Lacework admission controller to ensure 100% of images are scanned. This will pay off by reducing the risk of being breached, thus reducing the cost of remediating or mitigating vulnerabilities.
To find out more what Lacework can do for you please read our white paper on Vulnerability Management.