Clojure and native-image on JDK 11-flavored GraalVM (19.3.0+)

My takeaways from the GraalVM 19.3.0 release notes:

  1. JDK code inlining support
  2. Optional JDK11 support
  3. Better Windows support

These may not seem like much, but JDK code inlining fixes my one major niggle with native-image: it was too hard to get top-notch single-binary TLS, and now it just works.

(There are lots of other great things that happened in this release! They're just not in parts of Graal I use.)

Updating to JDK 11

Updating to JDK 11 is optional, but you might as well get it over with now.

A brief summary of Jigsaw (JDK9+) breakage

There are two things that bit Clojure-using early adopters of JDK 9, both consequences of Project Jigsaw:

  • The module system hiding previously-available classes
  • Changes to the way classloaders work

These two changes broke Clojure and a whole host of common libraries (mostly because of the now-unavailable classes) and the two common build tools (leiningen and boot, mostly because of the bootclassloader). These were quickly resolved and only affect you now if you care about supporting a long range of Clojure versions or a long range of JDKs. Since this blog post is about native-image, your output is a standalone binary and you get to pick the JDK version. However, you still need to know a bit about this background in order to understand some of the workarounds necessary for supporting GraalVM native-image targeting JDK11 and above. This is happening now because Graal was previously targeting JDK8, avoiding all of these issues.

The two classes that tend to come up that were often used but hidden in modules are java.sql.Timestamp and javax.xml.bind.DatatypeConverter. Despite their package names, they don't have anything to do with SQL or XML. Clojure used them because Timestamp was the good instant type (java.util.Date being famously bad), and DatatypeConverter was the good Base64 implementation available everywhere.

Example: DatatypeConverter in clj-http-lite

Outside of Clojure, clj-http and clj-http-lite used DatatypeConverter as well (also for base64). clj-http-lite is very popular in native-image Clojure projects. Like other libraries, they were quickly patched to support JDK9. The patch still attempted to import DatatypeConverter (see the actual patch in clj-http-lite), because the Base64 implementation replacing it isn't available on every JDK those libraries wanted to support. Normally, this is fine: the import fails and the alternative library gets used. However, the static analysis step in GraalVM sees the trial import and complains:

Error: com.oracle.graal.pointsto.constraints.UnresolvedElementException: Discovered unresolved type during parsing: javax.xml.bind.DatatypeConverter. To diagnose the issue you can use the --allow-incomplete-classpath option. The missing type is then reported at run time when it is accessed the first time.

The classic workaround was to add the module back with --add-modules java.xml.bind. Since it's just a trial import (see patch), you can instead use the workaround suggested in the error message (--allow-incomplete-classpath) and it'll work fine. The downside is this moves all errors to runtime. There's a Graal ticket for a more precise command line argument limiting the suppressed error to that class. I'm confident there's already a way to express this in Graal command line arguments, but I haven't tried to figure out the right incantation yet.

Single binary TLS!

Once you fix the above issue with clj-http-lite, as long as you enable the TLS subsystem (--enable-https), you'll just get single-binary HTTPS with libsunec.so under the hood, meaning I can finally close #1336.

Example project

I updated cljurl-graalvm-demo if you want to try any of this at home. If you're on Linux and want to debug the TLS issues, I wrote nscap specifically for this purpose. It leverages Linux namespaces to elegantly capture network traffic for a single process. You can then throw the resulting PCAP into e.g. wireshark.

What I'd still love to see in native-image

The compiler is slow. It's in the range of rustc speed: typically faster than C++, certainly slower than Go. It eats a lot of RAM. It's fine because I don't iterate on the binary version. I develop Clojure apps targeting native-image as if they're normal Clojure apps and then eventually run some end-to-end tests on the binary. But you knocked out my #1 feature so now I have a new one 😊

Corrections

I previously thought/posted that the locking macro (CLJ-1472) problems appear to be gone or at least reduced, but I have been unable to consistently reproduce that and others have reported no change.