Identifying active vs. dormant Log4j vulnerabilities

The vulnerability landscape is changing rapidly with the adoption of cloud-native computing. Services previously hidden to the outside world may now be accessible, and cloud services may be accessible due to complex configurations.

The Log4j vulnerability (reported in Dec 2021) is a recent example of a critical vulnerability that may exist widely (e.g. java using bundled Log4j jars), but where the actual number of actively exploitable processes is much smaller. For example, if no process is using the library or no process has network activity, the exploit risk is lower. For this reason, it is important to understand the places where a vulnerability is actively used to understand and prioritize fixing the real vulnerabilities.

There are two fundamental approaches to detecting vulnerabilities and threats:

  1. Static vulnerability scanning (or “side scanning”) – this typically involves scanning volumes (e.g. EBS snapshots, mounted volumes, etc.) for vulnerable packages and libraries.
  2. Runtime analysis – typically performed via an agent that looks at elements like network and process activity. This is the primary driver behind the Lacework Polygraph. Polygraph builds graphs of activity between entities (e.g. process and network.) and looks for interesting structural changes in these graphs over time.

One limitation with static vulnerability scanning is that it cannot tell if a vulnerable library is actually being used or not. As a result, for a library such as Log4j, static scanning will generate a large number of alerts, and it can quickly become overwhelming to tell what is real and what is not. JAR-based vulnerabilities like Log4j are not controlled by operating system package managers so traditional package scanning from apt, yum, etc., does not work.

This post describes how Lacework combined both approaches by adding “vulnerable” features to nodes in the Polygraphs. This unique approach drives higher quality alerts; and helps customers be able to distinguish between the following cases:

  • No Vulnerability – host has no vulnerable library
  • Dormant Vulnerability – host has vulnerable library, but no process using it
  • Active Vulnerability – some process using a vulnerable library, but no suspicious network activity
  • IOC (indicator of compromise) – some process using a vulnerable library and performing suspicious network activity

This is a significant improvement over just detecting based on the presence of a vulnerable library, which may be shipped as part of some package that is not used. This approach helps security teams focus their attention on the right alerts during a crisis.

Polygraph Examples

Here are some examples showing how the new Polygraphs have been augmented with vulnerability context. Nodes marked with “!” are processes using a known vulnerable library.

Example of a process using a vulnerable Log4j library receiving incoming connections from a known bad IP and making outbound connections. Example of a process using a vulnerable Log4j library receiving incoming connections from a known bad IP and making outbound connections.

 

 

Example of an ssh session launching a vulnerable java process that communicates with a known bad IP. Example of an ssh session launching a vulnerable java process that communicates with a known bad IP.

 

 

Example of a system process launching a java process that loads a vulnerable Log4j library. Example of a system process launching a java process that loads a vulnerable Log4j library.

Technical Details 

The Lacework agent monitors all process activity, including network and file activity. Every time it sees a process, it checks if the process has a file open whose sha256 hash matches a list of known vulnerabilities. In the case of java processes, these may be “fat jars” that are an archive of other jars so we need to recurse into the archive. The agent keeps a cache to make this process efficient, and every time it finds a process that has loaded a vulnerable library, it sends a message to the backend system.

The backend system constantly processes incoming data about process and network activity and updates the behavioral graphs. On each node, Lacework stores a properties dictionary that describes features about the node. Depending on the node type, we determine if the node is “bad” (e.g. does it match a known bad dns/ip in a threat intelligence feed). For process nodes, we add an additional field that denotes if the process is using a known vulnerable library and additional information about the vulnerability. Since the libraries loaded by a given process id (pid) tend to be the same for the lifetime of the process, we assume that the process is marked as vulnerable for its lifetime.

Lacework then applies machine learning algorithms to identify interesting changes in the features of the graph, such as distinguishing between processes of the same type that are using vulnerable libraries from those that are not.

This approach uses the many capabilities of the Polygraph Data Platform such as machine learning, graphs, alerts, and more. Although the initial version we released targets Log4j specifically, nothing about the approach is specific to detecting Log4j. We can quickly expand the set of vulnerabilities that can be scanned in future, and help our customers understand the impact and respond faster.

We are excited about this approach and the potential of doing more in the future to combine static analysis and dynamic runtime analysis. Be on the lookout for more blogs about our continued efforts in this area and if this sounds interesting, please consider joining our engineering team!