One day, when I was bored in a java class, I fully realized that the .class
files generated by the compiler followed a specific standard. I found the documentation online and fired up a new rust project. One thing led to another, and I was soon running binaries I compiled in class using my own implementation of the Java Virtual Machine.
In diving this deep into the inner workings of Java, I became a sort of expert on some of its more interesting features. Here’s some of what I learned:
invokedynamic
InstructionMost instructions in the JVM are sensible: stack manipulation, math, and even the object and array manipulation one expects from an object-oriented language. Not so much with invokedynamic
. This instruction is generally used with functions that need a polymorphic signature. When the instruction is executed, first the dynamic function is invoked with constant arguments to produce a method handle of a specific type. This handle is then invoked just like a normal function, using arguments from the stack and optionally producing a return value.
When concatenating strings, the java compiler generates an invokedynamic
instruction that calls the builtin makeConcatWithConstants
function with a static template string. This returns a method handle that takes in all of the values that are concatenated into the string and returns the full string. Here’s an example:
public String toString() {
return this.name + ", a " + this.age + "-year-old " + this.profession;
}
When compiled, this code will generate a dynamic invocation like the following to give the required method handle:
StringConcatFactory.makeConcatWithConstants(..., Function<String (String, int, String)>, "\u{1}, a \u{1}-year-old \u{1}")
It then invokes this method, passing in the name, age, and profession to create the final interpolated string.
Another use for the invokedynamic
instruction is for lambda objects. These are implementations of a single-method interface that can be trivially instantiated in source code by providing an implementation of the overrided method. Unlike an anonymous class, there is no associated class file for a lambda.
Instead, the java compiler first creates a static method in the current class with the signature of the overrided method, with any variables captured from the environment appended to the parameters. Then, the dynamic invocation passes the expected class, method name, static method reference, and expected method type into LambdaMetaFactory
. The returned method handle represents a lambda factory, which takes the captured variables as parameters and returns an object of the relevant type. When the overriden interface method is called on this object, it calls the underlying static method with the argument values concatenated with the captured variables. Here’s an example:
Integer x = 1;
Function<Integer, Integer> adder = (y) => x + y;
When compiled this code will generate a dynamic invocation something like this:
LambdaMetaFactory.metafactory(..., Function<Integer, Integer> (Integer), "apply", Integer (Integer))
The result of the final factory call is an object without a concrete class. In my implementation, this object contains a special lambda override field that includes the captured data and a reference to the lambda implementation method. When my method resolver sees this field, it applies the lambda by concatenating the parameters and captures, and invoking the stored method.
The JVM supports two specific methods: <init>
for instance initialization and <clinit>
for class initialization.
The <init>
method is always a void, but its number and type of arguments match each constructor in the source code for its member class. Whenever any object is created, this is immediately invoked to initialize it.
The <clinit>
is where logic for initializing static state for a class is located. It is always a avoid that takes no parameters. It is invoked the first time any static state is referenced. This method includes the constructors for the default values of any static variables as well as the code content of any static block.
Java enums include much of their logic in the <clinit>
function. This includes the initialization of all of its variants as well as creating the array that is used by the Enum.values()
method.