AWS EMR Log4j2 workaround for CVE-2021-44228

Last updated: December 20, 2021 The following is a workaround in order to patch the CVE-2021-44228 Remote Code Execution (RCE) vulnerability inside of Amazon AWS’s EMR service using bootstrap scripts that have been provided by the AWS team. These remediation steps are also applicable to CVE-2021-45046.

A short note on CVE-2021-44228

This vulnerability affects Log4j2 versions 2.15 and lower. Many projects or their dependencies use Log4j2 for their logging or logging configuration. When attempting to resolve this vulnerability in Spark or Hadoop, it is important to understand that even though the codebase of your job may not be vulnerable (ex: it uses log4j2 2.16 or greater), the underlying cluster the job runs on (Spark / Hadoop) may still be using a vulnerable version of this library. Thus it may not be completely safe to assume that setting the property spark.driver.extraJavaOptions will remediate this vulnerability.

Patching Log4j2 Dependencies in EMR

Amazon has released a number of patch scripts located in the bootstrap folder of the Amazon EMR s3 folder. Note, the elasticmapreduce S3 bucket is owned and managed by AWS. I recommend you pull the scripts directly from this location.

# EMR Version 5.x
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-5.30.2-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-5.31.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-5.32.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-5.33.1-v1.sh

# EMR Version 6.x
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.0.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.1.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.2.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.3.1-v1.sh
s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.4.0-v1.sh

It is critical that you use a bootstrap patch script version which matches the version of EMR your cluster is running.

Upon calling these scripts, they will patch all of the log4j JAR files inside of your cluster by removing the JNDI class from each of the packages.

You can view or download these using the AWS CLI using the following commands. Be sure to replace the script with the version of EMR you are using:

aws s3 ls s3://elasticmapreduce/bootstrap-actions/log4j/ --no-sign-request

aws s3 cp s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.4.0-v1.sh - --no-sign-request

The following is a collection of the latest EMR bootstrap patch scripts currently available (current as of December 20, 2021):

2021-12-15 21:25:31  -  patch-log4j-emr-5.10.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.11.4-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.12.3-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.13.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.14.2-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.15.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.16.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.17.2-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.18.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.19.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.20.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.21.2-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.22.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.23.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.24.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.25.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.26.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.27.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.28.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.29.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.30.2-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.31.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.32.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.33.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.34.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.7.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.8.3-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-5.9.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.0.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.1.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.2.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.3.1-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.4.0-v1.sh
2021-12-15 21:25:31  -  patch-log4j-emr-6.5.0-v1.sh

Just so you have an idea of what these scripts do, here is the code for the EMR 6.4 bootstrap patch script:

#!/bin/bash

set -ex

EMR_RELEASE=emr-6.4
DELETE_JNDI_PATH=/var/aws/emr/delete_jndi.sh
MANIFEST_PATCH_PATH=/var/aws/emr/manifest_site.patch
HIVE_INIT_PATCH_PATH=/var/aws/emr/hive_log4j.patch
SITE_PP_PATH=/var/aws/emr/bigtop-deploy/puppet/manifests/site.pp
HIVE_INIT_PATH=/var/aws/emr/bigtop-deploy/puppet/modules/hadoop_hive/manifests/init.pp


function check_release_version {
    CLUSTER_RELEASE=`cat /mnt/var/lib/instance-controller/extraInstanceData.json | jq -r '.releaseLabel' | cut -d "." -f 1,2`
    if [[ "$EMR_RELEASE" != "$CLUSTER_RELEASE" ]]; then
    echo "This script is written for $EMR_RELEASE and this cluster is $CLUSTER_RELEASE. Please use the correct bootstrap script for this release."
    exit 1
    else
    echo "Cluster is $CLUSTER_RELEASE, matches script release $EMR_RELEASE. Proceeding with update."
    fi
}

function create_delete_jndi_script {
    sudo bash -c "cat > $DELETE_JNDI_PATH" <<"EOF"
#/bin/bash

set -e

jars=("/usr/lib/flink/bin/bash-java-utils.jar" "/usr/lib/flink/lib/log4j-core-2.12.1.jar" "/usr/lib/hbase-operator-tools/hbase-hbck2-1.1.0.jar" "/usr/lib/hbase-operator-tools/hbase-tools-1.1.0.jar" "/usr/lib/trino/plugin/elasticsearch/log4j-core-2.13.3.jar" "/usr/lib/hive/lib/log4j-core-2.10.0.jar" "/usr/lib/hudi/cli/lib/log4j-core-2.10.0.jar" "/usr/lib/presto/plugin/presto-druid/log4j-core-2.8.2.jar" "/usr/lib/presto/plugin/presto-elasticsearch/log4j-core-2.9.1.jar" "/usr/lib/presto/plugin/presto-druid/log4j-core-2.8.2.jar" "/usr/lib/presto/plugin/presto-elasticsearch/log4j-core-2.9.1.jar" "/usr/lib/trino/plugin/elasticsearch/log4j-core-2.13.3.jar" "/usr/share/aws/emr/emr-log-analytics-metrics/lib/log4j-core-2.13.3.jar" "/usr/share/aws/emr/emr-metrics-collector/lib/log4j-core-2.11.2.jar")

class="org/apache/logging/log4j/core/lookup/JndiLookup"
jndi="${class}.class"

for index in "${!jars[@]}"; do
  jar=${jars[$index]}
  if [[  -f "$jar" ]]; then
    still_exists=`jar tf $jar | grep -i $class || true`
    if [[ ! -z "$still_exists" ]]; then
      echo "Removing JndiLookup class from $jar..."
      sudo zip -q -d $jar $jndi
      echo "Removed JndiLookup class from $jar."
    fi
  fi
done

remaining_jars=()

for index in "${!jars[@]}"; do
  jar=${jars[$index]}
  if [[ -f "$jar" ]] && jar tf $jar | grep -i $class ; then
    remaining_jars+=$jar
  fi
done

if [[ ${remaining_jars[@]} ]]; then
   echo "[ERROR] JndiLookup class still exists in: "
   printf "%s\n" "${remaining_jars[@]}"
   exit 1
fi

exit 0
EOF

    sudo chmod +x $DELETE_JNDI_PATH
}

function create_manifest_patch {
    sudo bash -c "cat > $MANIFEST_PATCH_PATH" <<"EOF"
--- a/bigtop-deploy/puppet/manifests/site.pp
+++ b/bigtop-deploy/puppet/manifests/site.pp
@@ -107,6 +107,10 @@ node default {
   } else {
     include node_with_components
   }
+
+  class { 'log4j_hotfix':
+    stage => 'pre'
+  }
 }

 if versioncmp($::puppetversion,'3.6.1') >= 0 {
@@ -115,3 +119,29 @@ if versioncmp($::puppetversion,'3.6.1') >= 0 {
     allow_virtual => $allow_virtual_packages,
   }
 }
+
+class log4j_hotfix {
+  if ("hbase-client" in hiera("bigtop::roles")) {
+    include hbase_operator_tools::library
+  }
+
+  exec { 'delete jndi':
+    path    => ['/bin', '/usr/bin', '/usr/sbin',],
+    command => "/bin/bash /var/aws/emr/delete_jndi.sh",
+    logoutput => true
+  }
+
+  exec { 'restart metrics-collector':
+    path    => ['/bin', '/usr/bin', '/usr/sbin',],
+    command => "systemctl restart metricscollector",
+    onlyif  => "systemctl is-active metricscollector",
+    require => [ Exec['delete jndi'] ]
+  }
+
+  exec { 'restart apppusher':
+    path    => ['/bin', '/usr/bin', '/usr/sbin',],
+    command => "systemctl restart apppusher",
+    onlyif  => "systemctl is-active apppusher",
+    require => [ Exec['delete jndi'] ]
+  }
+}
EOF
}

function create_hive_log4j_patch {
    sudo bash -c "cat > $HIVE_INIT_PATCH_PATH" <<"EOF"
--- a/bigtop-deploy/puppet/modules/hadoop_hive/manifests/init.pp
+++ b/bigtop-deploy/puppet/modules/hadoop_hive/manifests/init.pp
@@ -221,6 +221,12 @@ class hadoop_hive {
       require => Package['hive'],
     }

+    exec { 'change log4j loglevel to error':
+      path    => ['/bin', '/usr/bin', '/usr/sbin',],
+      command => "sed -i 's/^status = INFO/status = ERROR/g' /etc/hive/conf/{beeline,hive}-log4j2.properties && ln -sf /etc/hive/conf/hive-log4j2.properties /etc/hadoop/conf",
+      require => [Bigtop_file::Properties['/etc/hive/conf/hive-log4j2.properties'],Bigtop_file::Properties['/etc/hive/conf/beeline-log4j2.properties']]
+    }
+
     bigtop_file::properties { '/etc/hive/conf/hive-exec-log4j2.properties':
       source => '/etc/hive/conf.dist/hive-exec-log4j2.properties.default',
       overrides => $hive_exec_log4j2_overrides,
}
EOF
}

check_release_version
create_delete_jndi_script
create_manifest_patch
create_hive_log4j_patch

sudo patch -p1 -b $SITE_PP_PATH < $MANIFEST_PATCH_PATH
sudo patch -p1 -b $HIVE_INIT_PATH < $HIVE_INIT_PATCH_PATH

touch /tmp/created_jndi_patch

As you can see the script locates various vulnerable Log4j JAR files within the cluster and removes the JNDI lookup classes from them, thereby rendering them safe from exploitation.

Updating your cluster’s minor version

The provided bootstrap scripts are intended to be applied to the latest minor releases available for EMR. For example, if you are on 6.3.0 you should update to 6.3.1. You can check which version your cluster is on from the Configuration details section located within Summary tab of your cluster.

Adding the bootstrap script to your cluster

It is recommended that you copy the script to your own account’s S3 bucket. Additionally, this patch step must be the first bootstrap script that runs in the bootstrapping process of your cluster.

I recommend you follow the actions outlined by AWS to properly configure the bootstrap script in your clusters using the AWS documentation here.

Typically, you can use the following parameter to provide the bootstrap action when creating a cluster using the AWS CLI:

--bootstrap-actions Path="s3://elasticmapreduce/bootstrap-actions/log4j/patch-log4j-emr-6.4.0-v1.sh"

If you have configured the bootstrap actions correctly, you will see it listed in the bootstrap actions tab of the AWS Console:

AWS EMR Log4j2 Patch Step Console

Finally, it is important that you terminate any clusters that are running and re-launch them so that they run this script.

How to check if Log4J2 was patched correctly

You can log in to your cluster and copy down or extract any of the log4j jars.

  • You want to make sure that the class called JndiLookup is missing. If this class is gone, your log4j JAR has been successfully patched.

You can easily do this in a single operation using the following command, be sure to change the log4j path and version as needed for your cluster:

jar tvf log4j-core-2.11.2.jar | grep JndiLookup.class

It should return no result. If you see the class returned, it means it is still present in your JAR file and your cluster is still vulnerable.

Official AWS Update

As of December 17th, AWS publicly released details regarding these bootstrap scripts and the steps developers need to take. I encourage you to review this page as well.