Welcome to JAR Hell

Part 2: Application Deployment Strategies

13 Apr 2021

In Part 1, we looked at the basic model for loading and executing code on the JVM.

We saw how Classes (usually represented by .class files) provide the basic unit for JVM code, and how the Classpath makes classes (usually organized into JARs) available to the JVM at compile- and runtime. And we saw how tools like Maven help us use external libraries by fetching them from package repositories and incorporating them into the local Classpath.

But what about production deployments?

The Classpath still exists regardless of whether we’re running code on our Macbook or on a server in AWS, but for production, we’d prefer to run without a build tool, and ideally without any system dependencies beyond a Java Runtime Environment.

In this post, we’ll look at several ways to accomplish this.

Preface: Applications vs. Libraries

Software projects can be coarsely divided into 2 groups: Libraries and Applications. Libraries are consumed by other code, while Applications are meant to run on their own. On the JVM, both types of software can be packaged as JARs, but there are some common conventions around how each gets handled.

In general, library JARs only contain a “shallow” bundle of compiled .class files, meaning they include their own direct code but not that of their dependencies. This is sometimes also called a “skinny” JAR.

You might ask how this is useful, since if we depend on library A, and A depends on B, we obviously can’t run our application without also having B. But the answer is that the developers of A expect you to retrieve B on your own after consulting A’s dependency manifest (i.e. its Maven POM). When dealing with libraries we prefer smaller, granular packages that can be managed programmatically by a build tool. This gives downstream users more flexibility to cache packages, handle dependency conflicts, etc.

Applications, by contrast, are not intended for distribution to other developers or consumption by other code. Rather, they’re meant to run as standalone artifacts (e.g. they probably include a main method).

Applications require a deployment strategy which, one way or another, gets the application’s own code, along with a fully resolved Classpath containing any necessary libraries, into the target runtime environment. This type of deployment – running compiled applications along with their dependencies – is what we’re focused on in this article.

Deployment for the JVM

Luckily, the JVM makes the actual “run the code” portion fairly easy – as long as you don’t get too crazy with native dependencies (e.g JNI), or shelling out to system commands, you should be able to run your app on any server with the proper JRE version.

But you do have to worry about getting all of the compiled code into the right place. There are a lot of ways to do this, so we’ll look at several options:

Push JARs to a server and run
Uberjars
WAR files / J2EE
Docker Images
GraalVM Native Images

Push and Script

For starters we can always just do a straightforward upload of the library JARs our build tool resolves for our Classpath, along with the one it has create for our own code.

For example if we’re using Maven, we’ll end up with a classpath / run command (locally) that looks something like java -cp ./target/my-app.jar:~/.m2/repository/foo.jar:~/.m2/repository/bar.jar com.mycorp.MyMainClass. So to run in prod, we have to push those same 3 JARs into our target environment, and run a java command with them in the same Classpath arrangement.

There are a lot of ways to achieve this, so I tend to think of it as a rough pattern more than a specific implementation.

Sbt’s native-packager plugin is a great example of a tool that does this really well. It can package all of your JARs into a Zip archive or tarball, along with a handy run script (you can see the template for these) that will kick everything off. There are likely similar plugins for Maven or Gradle.

Uber/Fat/Assembly JARs

As mentioned in the Libraries vs. Applications section, we’ve so far been dealing with “skinny” jars containing 1 project’s compiled code.

In order to make a larger application work, we have to put a bunch of them side by side on the Classpath. This works fine, but can get annoying because you end up with dozens or even hundreds of JARs to cart around. What if you could just get it all onto one JAR?

It turns out JARs can be used (abused?) in this way, by creating what’s called an “Uber” JAR (AKA “Assembly” or “Fat” JAR). An uberjar flattens out the compiled code from your project’s JAR, plus the compiled code from all the JARs on its classpath into a single output JAR. It’s basically a whole bunch of JARs squished into one.

The benefit of this is that the final product no longer has any dependencies. Its whole Classpath is just the one resulting JAR, and your whole deployment model can consist of uploading the uberjar to production and invoking java -jar my-application.jar. It’s sort of the JAR equivalent of building a single executable binary out of a language like Go or Rust.

The simplicity of the single-file deployment strategy has made uberjars popular in recent years. They’re especially common in the Hadoop/Spark ecosystem, but get used a lot for web services or other server applications as well.

Most build tools can either build uberjars out of the box or provide a plugin for doing it: Maven Shade Plugin, sbt-assembly, Leiningen (built in). Consult the README for whichever of these you’re using for more details.

WAR Files and J2EE

WAR Files are a special JAR variant used for deploying certain types of Java web applications in the J2EE ecosystem. J2EE is a whole can of worms that I honestly don’t know much about, nor am I very interested in learning. But it does come up a lot so it’s worth touching on here.

In short, these applications are designed to deploy not to generic VMs (like a bare Ubuntu EC2 instance with java installed) but rather into specialized Java-based Application Servers, like Apache Tomcat. Your company would run one or more of these Tomcat instances, which get treated as shared infrastructure, and individual applications get pacakged into WARs and deployed into a pre-existing App Server, probably along with a bunch of other application WAR files.

The Application Server manages your app’s lifecycle, along with providing some shared system services, and because of these interactions extra care must be taken to ensure the 2 components cooperate well, which is what the WAR spec provides.

This article gives a good overview of this whole system. Here’s another good one about WARs specifically.

Despite my skepticism and poorly masked disdain for all this, it is kind of amusing to read about. If you squint right, running WARs via Tomcat isn’t so different from running “pods” of “containers” on abstracted machines via kubernetes, just with a lot more enterprise-y pocket protector vibes.

And the decline of one is certainly related to the rise of the other – while there are plenty of J2EE deployments running out there, much of the industry has moved away from this model. These days people care more about cloud portability and deployment standardization (e.g. running with Docker or deploying via the 12 Factor Model). This makes highly customized, language-specific infrastructure less appealing than a giant uberjar you can run with a single java -jar command.

Docker and Container Images

Ironically one of Java’s initial selling points – simplicity of deployment – has been somewhat diminished by the proliferation of Docker. Now that everyone’s prod environments are “BYO Container” anyway, the benefit of just putting the JRE on all your servers doesn’t matter as much.

Nevertheless, the JVM runs just fine in Docker, and in many cases, you can grab an appropriate base image (like OpenJDK), stuff your JARs into it, and go.

However it’s worth emphasizing: using Docker doesn’t change the fundamental JVM equation of Java Runtime + Classpath full of JARs = Application. The only difference is now the base image provides the JRE, and you’ll be loading your Classpath JARs into a container image rather than onto a bare server or VM.

So usually you’ll be putting into your Docker image some variation of one of the previous models:

Put your compiled code and all your dependencies into a docker image and include an entrypoint command that invokes them with the proper settings and Classpath. Basically the “Push & Script” strategy but in Docker. (sbt’s native-packager plugin does this)
Build an uberjar and put it in a JDK docker image. Your Dockerfile CMD setting will be something like java -jar /path/to/that.jar
Use a dedicated Java-to-Container build plugin like Google’s Jib.

Jib: Java-specific Container Image Builds

Jib is a new-ish project providing a pure-Java build tool for the OCI Image Spec. This is interesting for a few reasons.

First, because it’s implemented in Java, Jib integrates into existing JVM build tools. Normally, running docker build requires an RPC connection to a Docker daemon process on your machine. You need to have Docker installed, and the build process has to copy things back and forth between the daemon and the docker client. Jib allows you to sidestep all this and keep things entirely within your Maven or Gradle build.

Second, by targeting Java applications specifically (rather than providing a general-purpose container build tool) Jib is able to make some creative optimizations like:

Using distroless base images that contain only the JVM (not even a full OS!) which makes your images a lot smaller
Taking better advantage of image layering by splitting your dependencies (which tend to change less) into a separate layer from your classes (which change often). This gives you faster incremental builds since most builds only require re-building the smaller application layer.

Thanks to these tricks, Jib images are usually smaller and build faster than traditional Docker + Dockerfile-based images.

More info on Jib:

GraalVM Native Images

GraalVM is an alternative JVM runtime with some really cool features, one of which is the ability to do Ahead-of-Time compilation of JVM bytecode.

Traditionally, the JVM uses a JIT compiler to turn bytecode into native machine code at runtime. But Graal lets us do this at build time, which opens up the possibility of packaging JVM applications into self-contained, platform-specific executables, called Native Images.

A native image includes all of your application’s code, its dependencies, plus the necessary Java Runtime bits like the standard library and the garbage collector. It’s all there in one standalone binary package, so you don’t even need to have java installed anymore.

Because the runtime doesn’t have to JIT all your code at startup, the resulting program also starts much faster and requires less memory than traditional JVM programs, making it appealing for use cases like CLI utilities where the JVM previously was not a great fit.

While JVM CLIs are cool, the Industry is mostly excited about native images for a different reason: Serverless.

Everyone wants to stuff their Java programs into a Lambda/Cloud Run/whatever function and use them on-demand, but this doesn’t work well if your bloated app takes 30 seconds to boot. So native image provides a path to running Java programs in these environments.

So what’s the catch? Well there are 2 main ones:

Restrictions of the native image AOT process mean that some runtime features like reflection don’t work well or at all. In some cases there are workarounds but YMMV. Consult the docs. (Side note: Ironically this has led to a wave of backpedalling across the industry, as everyone scrambles to get things like Spring running without reflection. Suddenly reflection is bad and compile time abstractions are cool in Java.)
So far, native image performance is at least different, and generally slightly worse, than traditional JVMs. The AOT process is able to make fewer optimizations than the traditional JIT, so your “warmed up” throughput will usually be worse. There are some workarounds, like PGO, and this landscape continues to evolve, so again, do your research.

GraalVM is really an amazing technological advancement for the JVM. It’s the kind of thing that Java developers 15 years ago would not have believed to be possible. Will be very interesting to see where this and similar advancements take us in the coming years.

Summary

So there’s your crash course in JVM app packaging. There are a ton of details surrounding this topic, so we’ve inevitably had to skip over a lot. But hopefully it provides an overview of the landscape, and serves as a starting point to make informed further research elsewhere.

What’s next? I’m sure you must be thinking: “Wow, with a rock-solid runtime and so many great deployment options, surely everything must work perfectly in production?”

Ha! If only! Just whisper the words ClassNotFoundException to a Java developer and see how they react.

Unfortunately, it does not, in fact, all work perfectly in production. To learn more about this, stay tuned for Part 3, in which we will descend into Classpath Hell, and hopefully emerge singed, but enlightened.

Horace Williams