For various reasons, at Alfresco, we are using Oracle JDK and CentOS for our base java image. On top of that we’ve ported the standard tomcat Dockerfile, to make a base tomcat image. Unlike the official tomcat images, we also ship a full JDK (and not just the JRE). Like the official images, we obviously ship the native tomcat library.
Other constraints mean that we are not yet in a situation to ship an image based on musl.
Purely as an experiment, how small can we make a glibc-based image that satisfies most of these constraints?
Even if we were able to ship based on Alpine Linux, there is a bigger issue: every binary (and library) on the image increases the attack surface area (and the maintenance overhead for us and our customers).
Finding out the size of a(n already pulled) image is pretty easy. Here’s an example:
docker inspect openjdk:8-jdk-alpine | jq '.[] | "\(.Size) \(.RepoTags[0])"'
Now running anything non-trivial on Docker’s scratch image is “unpleasant”. Luckily for us, those lovely people at Google have created Distroless which has a small base size of about 16 MiB. They have a base version and a specific version for java.
Sadly, the java version runs java -jar and because tomcat relies on having two jars to start (bootstrap.jar and tomcat-juli.jar) I couldn’t work out how to start it like that. (If you can show me how, I’ll update the article.)
Distroless uses libraries from Debian stretch, but is missing some that are needed to run java. Firing up debian:stretch from docker hub, installing openjdk-8-jre-headless or openjdk-8-jdk-headless and running ldd against the included binaries and libraries we see we are missing four, which we can COPY in: libz, libstc++, libgcc_s and libapr-1 from a multi-stage build.
Now Distroless is already 12 MiB larger that Alpine, which is surprising, as it contains very little. Also, binaries and libraries compiled against glibc are also bigger: libjvm.so is 49% bigger; rt.jar is 87% bigger. Which means, from a pure size view the results are disappointing:
Name | Base OS | Java Vendor | JRE/JDK | Tomcat | MiB |
---|---|---|---|---|---|
alpine:3.7 | Alpine | 4 | |||
gcr.io/distroless/base | distroless | 16 | |||
openjdk:8-jre-alpine | Alpine | OpenJDK | JRE | 78 | |
openjdk:8-jdk-alpine | Alpine | OpenJDK | JDK | 97 | |
tomcat:8.5-alpine | Alpine | OpenJDK | JRE | 8.5 | 101 |
nicdoye/micro-tomcat-jre | distroless | OpenJDK | JRE | 8.5 | 142 |
nicdoye/micro-tomcat-jdk | distroless | OpenJDK | JDK | 8.5 | 179 |
tomcat:8.5-slim | Debian Slim | OpenJDK | JRE | 8.5 | 213 |
alfresco/alfresco-base-java:8 | CentOS | Oracle | JDK | 468 | |
alfresco/alfresco-base-tomcat:8.5 | CentOS | Oracle | JDK | 8.5 | 506 |
The JRE image is 41% larger than Alpine, and the full JDK version bigger still. Clearly we’re a lot smaller than the other glibc-based images.
The security win over the tomcat:8.5-alpine is pretty small, too. There are only 61 packages (including OpenJDK) in that image and many of them do not contain libraries or binaries.
Over a full glibc-based Linux distro (even Debian slim) there is a win: Distroless base only contains glibc, libssl and openssl; once we add the other 4 libraries, that leaves very little scope for security vulnerabilities. The downside is that we have to monitor the .debs containing libraries via an external mechanism and update accordingly.
Firstly, we haven’t created a OracleJDK on “Distroless but with Debian’s libraries replaced by CentOS’” image, which would be the most useful for Alfresco. I’ll leave that for another day.
Clearly, Alpine Linux is amazing in what it can squeeze in to such a small space, and only has a slightly larger attack surface.
From a security point of view, we have created something better than the other glibc-based distros, and if that’s important to you, then using Distroless is a viable option.
Image credit: "A cat on HMAS Encounter" by photographer not identified. [Public domain] via Wikimedia Commons.
This article has also been published on Medium.