2. Bytecode compilation at compile-time
What is bytecode and how does it work? What does Scala code look like when compiled to bytecode?
First, why do we even care about bytecode?
Don’t expect to have to look at bytecode too frequently. Usually (hopefully) we can trust that the compiler has transformed your code into bytecode correctly. Most developers won’t ever have to get into detail of the generated bytecode, and that’s a good thing – that’s why we have higher-level languages!
But it does give some context as to what’s happening, and having this context can give you an understanding of why certain optimisations are possible or not, and why some code runs blazingly fast while some limps along.
Bytecode is for a stack-based machine
Bytecode is the first intermediate step of the compilers that lead to the CPU. It’s a language describing your code in a platform-independent way, in particular for a fictional platform.
It describes the execution of your program on a stack-based machine, as opposed to the register-based processors we’re used to. Values are put on a stack; functions use those values as parameters and replace them with results.
It’s not particularly efficient, but it’s not supposed to be. We’re still quite a long way from the metal.
Here’s an example of the steps of execution of a short code snippet:
- first, the two integer values, 1 and 2, are added to the stack.
- the
iadd
instruction to add two integers is invoked, and the result remains on the stack. - next we need to convert this integer 3 into a string that can be concatenated with the longer prefix. The instruction here is
invokestatic String.valueOf
, which invokes this method with the parameter 3 and leaves the string"3"
on the stack. - finally, add the longer string
"my favourite number is: "
and run the instructioninvokevirtual String.concat
. The expected result is left on the stack.
Note the static
and virtual
instructions – these have the same meaning as described above. The virtual call is because the String concatenation method “belongs” to the string on which it is called, and the static call has no such instance.
Reading a classfile
The JDK ships with a disassembler app called javap
that can display bytecode in a somewhat human-readable form.
We’re going to use a trivial example to look at some bytecode. This example ScalaConstants
contains a constant value and a utility function in an object
, which is Scala’s implementation of a singleton. Below it is the bytecode as shown by javap -p ScalaConstants
, just the type signatures with no disassembled code, for now.
Notice first that there are two classes, one with a $
appended to the name. This is synthetically generated by the Scala compiler, and is how singleton objects (and companion objects) are implemented. This is a sort of hidden type – it’s accessed only through the main type ScalaConstants
that’s declared in source.
Lines with parentheses represent methods, lines without represent fields. The public static {}
is the class’s static initialiser that runs when the class is loaded. In this case, when the static initialiser is called the constructor is run, and a new instance saved in the MODULE$
field. This is the globally-visibile singleton instance.
Calling a method on an object
What happens when a method is invoked? Suppose we want to see what happens for ScalaConstants.ichBinEinUtilityFunction(3)
. Let’s look at the bytecode for the utility function, which we can get using javap -c ScalaConstants
.
- load the singleton
ScalaConstants$
object from thestatic
field calledMODULE$
onto the stack - load the integer parameter onto the stack
- invoke the instance method which, yes, has the same name as this function
- return the String reference
What about inside the delegate function? From javap -c ScalaConstants$
:
First note that this one doesn’t have a static
modifier – it’s an instance method on the singleton object.
- load the integer parameter onto the stack.
- turn the
int
primitive into an object. - invoke the
toString
method on the new object. This is avirtual
call because it’s a non-static method, called on an instance. - return the string reference.
The stack holds parameters to the method, as we saw. When calling a method on an instance, a non-static method, the zeroth parameter is this
. You’ll notice the difference when loading the int
parameter in these last two examples, iload_1
instead of iload_0
.
Initialising a singleton object
One more example – how does the singleton itself get initialised? Below is the initialisation code from the disassembled ScalaConstants$
class.
static {}
is the static initialiser that is run when the class is loaded. The code here creates the singleton object, invokes its constructor and returns.
Let’s walk through the constructor of the class, private ScalaConstants$()
:
aload_0
loads thethis
reference onto the stack, and then invokes the super-constructor. On construction every class runs its parent class constructor first, all the way up to the top typeObject
. That top type constructor has its own instruction, as we see here.- The
this
pointer is loaded again, so that it can be written into the static fieldMODULE$
. - The
this
pointer is loaded again, followed by the string constant which is written into theichBinEinConstant
field onthis
.
That’s enough of an introduction to bytecode. The language compilers scalac
and javac
that compile to bytecode do some optimisation, including any language-specific things – an often-cited example is transformation of string concatenation like "a" + "b" + "c"
into a StringBuilder
expression, which is much more efficient by saving repeated copying.
Most of the heavy lifting is done later, by the Just-in-Time compilers at runtime.