Our Journey to Detect Log4j-Vulnerable Machines
Introduction
Log4Shell (CVE-2021-44228) is a remote code execution (RCE) vulnerability in the Apache-foundation open-source logging library Log4j. It was published on December 9, 2021, and then all hell broke loose. As Log4j is a common logging library for Java applications, it is highly widespread.
At Guardicore (now part of Akamai), we aim to make sure our customers are as secure as possible, so we boarded the steam wagon of Log4j detection. We wanted to make sure we can help our customers map all their vulnerable servers and offer them a segmentation solution to limit the impact radius of any possible exploitation. As Akamai Guardicore is a network segmentation solution, we have strong visibility into the data center’s network traffic. For host-based information, we have Guardicore Insight — an integration with OSQuery, an open-source program to allow SQLite-like querying of various OS information. Armed with this, we started our journey.
Note: Perfecting our detection tool is an ever-evolving process. We welcome all feedback from the community regarding better queries or improvements to our detection logic.
Initial Version — Detecting Listening Java Apps
We started learning about the vulnerability and all its nuances the moment it was published, and we also wanted to share an immediate solution to our customers. We knew that the vulnerability affects Java applications, and that attackers are using that vulnerability as an entry point into the organization’s network. So, to give immediate value, we started by detecting all listening Java applications. Mapping them and then applying network segmentation to mitigate arbitrary internet access would provide enough protection while we worked out a better detection method.
select p.cmdline, p.cwd, l.port from processes AS p join listening_ports AS l on p.pid=l.pid where cmdline like "%java%" or cmdline like "%jar%" |
Fig. 1: OSQuery for detection of Java applications that are listening for connections
Although this query isn’t the most precise, and can raise both false positives and false negatives, it was a good start. We could release this to our Solution Center and to Customer Success operators, and focus on honing our detection methods.
Not-So-Initial Version — Better Java Detection, Exploitation Attempts Lookups
Having deployed the initial response, we now had more time to understand the vulnerability in depth and look into more detection methods. We knew that we needed to better detect Java applications, and since not all Java applications use the Java executables for Windows (PE) or Linux (ELF), we had to come up with a better way to do that. To settle this problem, we created a union of two queries. The first is an improved version of the Java application detection, with a more extensive list of strings to identify Java in the command line:
SELECT DISTINCT LOWER(path) || '%%' AS regex_path FROM processes WHERE (LOWER(cmdline) LIKE '%java%' OR (cmdline) LIKE '%jar%' OR LOWER(cmdline) LIKE '%jvm%' OR LOWER(cmdline) LIKE '%jdk%' OR LOWER(cmdline) LIKE '%jre%') |
Fig. 2: A better query for Java application detection
But this query was not enough because some applications use Java by loading the Java virtual machine into their own memory and not by direct reference (e.g., Tomcat, and some instances of Elasticsearch). Therefore, we added to the union a list of applications that don’t run the Java executable directly, yet are tied to Log4j:
SELECT DISTINCT REGEX_MATCH(LOWER(path), '.*?(logstash|jenkins|tomcat|vsphere|vcenter|apache|okta).*?(/|\\)', 0) || '%%' AS regex_path FROM processes WHERE regex_path IS NOT NULL |
Fig. 3: Query for a partial list of Java applications that can run without Java executable
To be thorough, we decided to also check the base directory of each process. If it contained any JAR files, we could probably safely deem it as a Java-dependent process, and check for Log4j dependencies, as well.
SELECT file.directory || "%%" AS regex_path FROM processes INNER JOIN file ON file.path like REPLACE(processes.path, processes.name, "%%") AND file.filename LIKE "%.jar" |
Fig. 4: Query to find all Java-dependent processes by looking at their containing folders and looking for JAR files
Finally, to detect Log4j dependencies, we checked each of the paths the previous queries returned, and checked for any filename starting with log4j and ending with .jar.
Detecting Log4j vulnerabilities is not enough — we can also look and see if any exploit attempt was made on the system by analyzing the log files to search for the exploitation and JNDI lookup strings. Instead of developing novel ways, we relied on Florian Roth’s YARA signatures and adjusted them to work with Insight.1
1 Because of some parsing issues, we had to convert the YARA strings to their hex bytes counterparts
The full queries for both Log4j detection and log analysis can be found in our previous blog post: Mitigating Log4j Abuse Using Akamai Guardicore Segmentation.
Understanding Java
After releasing the previous set of queries to our field agents, we turned to Java itself, to make sure that our queries actually do what we want them to do — detect all running Java applications on a computer and search for Log4j dependencies. We had to conduct research on how Java is executed and how it is packaged.
Java Execution
For starters, all Java applications have to run on the Java runtime and Java Virtual Machine (JVM), which means that most Java applications will run from one of the following processes, which are the Java executables for Windows and Linux:
- java
- javaw
- java.exe
- javaw.exe
One exception to that, which we discovered while checking the above hypothesis and then confirmed with Java’s documentation, is that programs can load the JVM library directly into their memory (we’ve seen that happen with both Tomcat and Elasticsearch, on both Windows and Linux). Since the Java executable does this, as well (and thanks to Uptycs for the trick), we can just look for the JVM dependency in the process memory! For good measure, we can also add a union to check the process name, to make sure nothing slips:
SELECT DISTINCT proc.pid, proc.path, proc.cmdline, proc.cwd, listening.port, listening.address, listening.protocol FROM process_memory_map AS mmap LEFT JOIN processes AS proc USING(pid) LEFT JOIN listening_ports AS listening USING(pid) WHERE mmap.path LIKE "%jvm%" UNION SELECT DISTINCT proc.pid, proc.path, proc.cmdline, proc.cwd, listening.port, listening.address, listening.protocol FROM processes AS proc LEFT JOIN listening_ports AS listening USING(pid) WHERE proc.name IN ("java", "javaw", "java.exe", "javaw.exe") |
Fig. 5: Query to find all Java-dependent processes by looking at their memory map
Java Library Dependency
To load other module dependencies, Java programs have to specify them in a variable called CLASSPATH. The most common way of specifying dependencies in the classpath is by direct naming — just put the JAR (relative or absolute path) directly in the classpath. The other way is to include a folder in the classpath, and all JARs under it will be loaded.
The real issue is with detecting the classpath. While the classic option is to specify it in the command line, the classpath can also be specified in environment variables or directly inside the main JAR file.
Detection Script Development
Equipped with our newfound knowledge about Java internals, we wanted to improve our previous queries. On busier servers, the queries could take a lot of computation resources, which would be quite taxing. We can’t afford to overwork our customers’ servers — as they are critical parts of the data center. To improve performance, we split the bigger query into multiple smaller queries that are guaranteed to run more efficiently. We can then run them one after the other in a Python script that integrates with a designated REST API to run Insight queries. This would also pave the path for us to add more complex analysis to determine actual vulnerabilities (by checking the Log4j version) and also to check if there are any mitigations in place, and output all this in an easy-to-process format (e.g., a CSV file).
The script logic is as follows:
- Detect all Java applications using the query from Figure 5
- Analyze the classpath to find all JAR file dependencies that are mentioned in it directly
- Extract all folder dependencies in the classpath, and for each folder extract all JAR files under it
- For each JAR file that we found, check its name to see if it is a relevant Log4j JAR file (log4j-core), extract its version and check if that version is vulnerable to Log4Shell
- Output all our findings to a CSV file
The script worked well and it produced more reliable results than our previous queries, which we could share with our customers. In some cases, it even found dependencies that weren’t detected by the customers’ other program management/visibility tools.
Future Plans
While our script produces satisfactory results, there’s always room for improvement. Our future plans for the moment include two additions:
Detect whether any mitigations are in place that prevent an exploit from working despite vulnerability existence (e.g., a newer JRE version or a nondefault configuration)2
Pull all JAR files that we detected and recursively parse them to look for a Log4j dependency so we don’t rely only on the JAR existing directly in the filesystem
JAR files are basically ZIP files, which can package inside them other JAR dependencies — by parsing the ZIP/JAR file tree, we can look for more Log4j dependencies
Conclusion
We’ve described our process of developing a detection tool for Log4j-vulnerable servers that utilizes Akamai Guardicore Segmentation. This is an ongoing journey since it seems Log4Shell will keep hounding us for a while, so we aim for constant improvement and better detection.
There are many detection tools out there and many great ideas, and we hope that by sharing our progress we will help others develop their own scripts and methods for Log4j detection. We are always open to dialog, so feel free to contact us with suggestions and ideas.
2 Although there have been working PoCs that circumvent those mitigations, they require more sophistication. We check for those mitigations, as well, and change the verdict to “partially vulnerable.”