To recap: I’ve implemented dotNet’s yield feature using bytecode manipulation, since AOP was not available. Luckily, more than one framework is available for the task – I’ve started the project with BCEL but as soon as I’ve finished it I’ve ported it to ASM.
Good features of BCEL
The reason for starting with BCEL was that it simply was the only one I knew at the time. I’ve encountered BCEL back in 2002 or so, and back then I didn’t really need it; still, it stuck to my mind as being a framework for bytecode manipulation. Starting with it was pretty straight-forward, and it had a seemingly comfortable API: Load a class into a DOM, and then manipulate the DOM however you want.
BCEL contained some really interesting concepts, such as encapsulating each instruction with a class, and having each instruction class belong to a hierarchy, using the class hierarchy to group the instructions. For example, the
ALOAD class (encapsulating the ALOAD instruction) has the
LoadInstruction as its superclass, which in turn subclasses
LocalVariableInstruction, which subclasses
BCEL also uses the Visitor design pattern in order to traverse code, so that you could pass your own
MyVisitor instance to each instruction. The interesting part about the Visitor was that it contained a
visitINSTRUCTION for each instruction, and for each superclass of the instruction with the same names as the class hierarchy in BCEL. So, for ALOAD, we will have the expected
visitALOAD, and also
visitLocalVariableInstruction (rant: There is no
visitInstruction). This gives you the ability to choose which scope of instructions you would like to deal with.
Some more nice things in the BCEL encapsulation:
Where the BCEL API breaks
Unfortunately, not everything was good for me with BCEL. It started with BCEL’s distinction between the read-only DOM and the (seemingly) write-only DOM. That is, when you first start working for bytecode manipulation, you load the original class into a
JavaClass instance. Using that class, you can iterate all the methods, fields, annotations and so on. However, in order to create new elements, you will need to instantiate a
ClassGen class, wrapping
JavaClass. In addition, there is no way to read the bytecode of a method using the read-only
Method instance; again, this can only be done using the
MethodGen instance, wrapping
The second thing was the horrible documentation and mismatched source code. I’ve downloaded the most recent source code, and I’ve downloaded the most recent binaries. They didn’t match. It was hell to debug anything, and the documentation was horrible – in many cases I simply couldn’t understand what a method was about to do, and what it required me to do afterwards. The guide was not enough and didn’t contain enough material on manipulation of code, leaving me to guess a lot of the inner works.
The third was the constants pool. Each Java class contains a constants pool, into which all constants are placed and indexed so that method calls, field access and constant strings and numbers can be accessed using a mere index on the bytecode. The constants pool also references itself, so that class names, method names and signatures are actually saved as strings on the constants pool and a method reference will contain only three indices. In BCEL, the developer needs to manage this pool of constants. And it’s not pretty. Especially when wanting to change the class’ name – Which you almost always have to do when manipulating classes, since you can’t redefine a class using a normal ClassLoader.
Then there were just interface hassles: The
InstructionList couldn’t receive a
Visitor, I had to iterate all its elements manually. The interface for visiting the “read-only” class hierarchy did not support going into the bytecode as it was a part of the mutable hierarchy. Unfortunately, the second visitor, located in a different package, was given the same name: This caused any visitor that wanted to implement both interfaces, I had to use the full qualified name.
Why the ASM switch was scary
ASM has a different idea in mind: It is a very thin layer over the bytecode itself, and has a SAX-like interface. The develop instantiates a
ClassVisitor instances, these in turn can spawn
FieldVisitors and so on when appropriate. ASM comes with one
ClassWriter, which writes whatever it gets as bytecode.
Needless to say that the rules of when a
visit method is allowed to be invoked on a
Visitor is crucial: Call the
visitInsn (to visit an instruction) on a
MethodVisitor after the
visitEnd method was invoked, and you’ll get malformed bytecode which will not pass verification.
Also, since this was a very thin layer above bytecode, there is no encapsulation of anything except for labels and constant pool elements: All
visit methods receive an integer and different
Strings as parameters.
The switch from such a high-level abstraction (BCEL) to something as low-level as ASM was really not for me, especially since I’ve actually finished the implementation in BCEL already; I’ve only looked into ASM to read about different approaches.
So why ASM?
Eventually ASM won for two main reasons: Better documentation and cleaner API. I’ve decided to port to ASM, and I admit that the beginning was not easy but after a day I had it all working.
ASM comes with about 140 pages of thorough explanations. API, common pitfalls, tips and tricks, optimizations available, everything is there, and if it was out of topic (such as ClassLoader and JVM TI) there was a short description and a reference on where to read more.
In addition, there is a clean, elegant API which features an event-model similar to SAX, where
ClassVisitor‘s common implementation,
ClassAdapter, can be chained together. That way, one could divide different manipulations into different classes (in my case, the local variable promoter and the state keeper) and chain them together, eventually hooking them up to a
ClassWriter to create a byte array, which is used to define the new class.
A great debugging feature is an implementation of
TraceClassVisitor, which outputs in great detail every element it encounters. If you chain it just before the
ClassWriter (or instead of it), you can see exactly what your class looks like when its loaded – Instead of having to write it to file and then look it up using
javap, or manually print out each instruction as I had to do in BCEL.
ASM also features a DOM-like API, but I haven’t used it so I can’t comment on it. I can say, though, that eventually the difference in lines of code between BCEL to ASM wasn’t large – And that’s saying a lot when you realise that ASM is not abstracting any of the instructions. The reason is, in my case, ASM’s abstraction of the constants pool – one huge worry less for me.
Since I didn’t use a DOM model, the state keeping had to be done in two phases: One to mark all the locations where the
yieldReturn appear and one to actually write the code and state switching. This was a small price to pay, to be honest.
Eventually, ASM seemed less “clunky”, and like something that is easy to learn, use, debug and deploy.