A Log4j Retrospective Part 1: Vulnerability Background
Timelines
On November 24, 2021, the Apache Foundation was privately notified by Alibaba’s Cloud Security team that Log4j, a widely used Java-based logging library, contained a major vulnerability that could result in the leaking of private information as well as remote code execution (RCE). This vulnerability had been present since 2013.
The following day, the Apache Foundation reserved CVE-2021-44228 and began researching a fix. Over the next 12 days, several changes were introduced into the source code to address the issue, and on December 9, 2021, the vulnerability was publicly disclosed.
This resulted in a flood of exploit attempts that have been growing at an alarming rate ever since.
What is Log4j?
To truly understand the vulnerability, we need to understand what Log4j is. Log4j is a library that enjoys widespread usage among developers in the Java community. It provides a simple yet robust framework for the logging of error messages, diagnostic information, and more.
Among some of its impressive features is the ability to log to multiple targets, including, but not limited to, the console, a file, a remote server using TCP, Syslog, NT Event Logs, and email. Additionally, it enables hierarchical filtering of log messages, log levels, custom layouts, and much more.
In short, it has a comprehensive set of features that make it a very attractive library for developers to leverage, which has led to its ubiquity in everything from web applications to embedded devices.
Lookups and nesting
One of the really powerful features that Log4j supports is known as lookups. Lookups enable a developer to embed variables or expressions into text that are automatically evaluated by Log4j prior to output. For example, a developer could write code that logs the following text:
“${date:MM-dd-yyyy} All Systems Good”
Log4j recognizes the pattern ${date:MM-dd-yyyy} as a date lookup, and dutifully replaces that expression with today’s date. For example, if the date were December 20, 2021, it would modify the text to the following before outputting it to a target:
“12-20-2021 All Systems Good”
This is extremely helpful for developers. Without this functionality, they must manually write code that looks up the date, formats it into a string, prepends it to the log line, and then outputs it. And although the code above is not particularly difficult or taxing to write, it does not relate to the core business logic of the software and ends up being repeated from one project to the next.
By leveraging the already coded functionality within the Log4j library, developers can instead focus on the differentiation that matters to their project and let Log4j handle all logging-related tasks.
There are many such types of lookup expressions supported by Log4j. Let’s examine two more that will feature heavily in this article: env and lower. env allows the inclusion of environment variables on the host system into log lines. For example, a developer logging the text:
“The current user is ${env:USER}”
Would produce the following output, assuming the software is running as user Administrator.
“The current user is Administrator”
Unlike env and date that inject new data into the text, lower can be used to manipulate what is already present. Log4j simply lowercases what appears within the expression. As an example:
“The lower case text is ${lower:ABCDEFG}”
would produce:
“The lower case text is abcdefg”
This example on its own is not very interesting. Why not simply lowercase the string yourself? The power of this becomes more obvious when we take into account that Log4j allows nesting of lookup expressions.
We can combine the prior two expressions as follows:
“The lower case current user is ${lower:${env:USER}}”
This causes Log4j to first evaluate the ${env:USER} expression as Administrator, and then feed that to lower which would produce administrator, ultimately resulting in the following line:
“The lower case current user is administrator”
JNDI
While date, env, and lower are all interesting and highly useful, this vulnerability would not be possible without the JNDI lookup. JNDI, or the Java Naming and Directory Interface, is a mechanism built natively into the Java development and runtime environments that allows simplified querying of various directory services for information using a common interface.
As it turns out, there are many different types of supported directory services. For example, JNDI supports querying DNS servers to discover the IP address of a host, as well as querying AD and LDAP for directory entries. It even supports querying the running Java environment itself for Environmental Entries, which can be thought of as specialized configuration options for the current running software.
The JNDI lookup expression in Log4j allows developers to access this enormously powerful subsystem directly through embedded expressions in the logged text. For example, if a developer were to attempt to log the following string:
“The current mail host is ${jndi:java:comp/env/mailhost}”
Log4j would recognize the ${jndi:java:comp/env/mailhost} expression as a JNDI lookup and pass the pseudo URL of java:comp/env/mailhost to the JNDI subsystem. JNDI would recognize this particular URL type as a query to lookup a configuration option called mailhost for the current running component.
Let’s imagine this is configured as mymailserver.example.com. JNDI would pass this information back to Log4j, which would then replace the lookup expression with mymailserver.example.com, resulting in the following output:
“The current mail host is mymailserver.example.com”
Understand how the vulnerability exists
In short, Apache’s Log4j vulnerability presented a major opportunity to attackers because of the library’s wide popularity and its lookup, nesting, and JNDI capabilities in particular. This functionality, though powerful for developers, also creates opportunities to pass requests that can exfiltrate data or cause RCE. With this knowledge, we can now begin to better understand the vulnerability and how attackers may exploit the system.
What’s ahead
In Part 2 of this series, we will look at how attackers began to exploit this vulnerability for data exfiltration and RCE.