Jul 15

To recap the last post: I’ve tried to mimic the really helpful dotNet/C# feature “yield”, which allows you to quickly write iterators without worrying about preserving state.

Let me just make one thing clear: The implementation does not just collect all the yielded values into a list, and eventually returns the list. It actually returns the items one item at a time, saving both calculation time and allocation space.

The implementation had to deal with two issues: First, it had to keep the state of all local variables. Second, it had to allow the method to return to the previous location the next time it is invoked. The first thing that comes to mind with these method-changing procedures is AOP – Aspect Oriented Programming. That option was not taken eventually, and I’ll explain why. First, let’s look at what the two issues mean, and what were my solutions for them.

Keeping the state for local variables has a simple concept: Simply “promote” all of them to become class members. That way, no matter how they’re used inside the method, their state is saved for the next time the method is invoked.

For example, a Yielder implementation such as:

class MyYielder extends Yielder {
  private int[] arr = ...;
  public void yieldNextCore() {
    int total = 0;
    for (int i = 0; i < arr.length; i++) {
      yieldReturn(arr[i]);
    }
    yieldReturn(total);
  }
}

Would become:

class MyYielder extends Yielder {
  private int[] arr = ...;
  private int i;
  private int total;
  public void yieldNextCore() {
    total = 0;
    for (i = 0; i < arr.length; i++) {
      yieldReturn(arr[i]); 
    }
    yieldReturn(total);
  }
}

The second problem is a bit more tricky, but has a simple solution: in order to return to the previous location in a method, one had to create a simple state machine: keep a state member in memory, holding an integer. Mark each yieldReturn with an integer, and after each call make sure the member is set to that state and that the method exists immediately after it. After each of these, a label is placed.

When all of that is done, a switch statement needs to be placed in the beginning of the method over the new state member, and jump to the corresponding label.

To go back to the previous example, after the state machine elements are in place we get:

class MyYielder extends Yielder {
  private int[] arr = ...;
  private int i;
  private total;
  private byte state;
  public void yieldNextCore() {
    switch (state) {
      case 1: goto L1;
      case 2: goto L2;
    }
    total = 0;
    for (i = 0; i < arr.length; i++) {
      yieldReturn(arr[i]); 
      state = 1;
      return;
      L1:
    }
    yieldReturn(total);
    state = 2;
    return;
    L2:
  }
}

Some people might raise an eyebrow and say, "but Java doesn't have a goto statement!". These people would be both right and wrong. Java does have a goto statement; it's just not in use by compilers. Meaning, the previous code will not compile on anything.

Eventually, that's the reason I chose not to go with the AOP solutions available today when I implemented this feature: I couldn't see a way to do it without making the state machine overly complex. To be honest, I don't know enough about the AOP solutions available, so if anyone cares to contradict me I'll be happy to learn from them.

So, not having AOP as an option I was forced to go a bit lower, to "manual" bytecode manipulations. Since I'm still not BC guru, I decided to use the tools available today. The only tool I knew when I started this was Jakarta's BCEL - the ByteCode Engineering Library. I had a working version using BCEL, but eventually I switched the entire thing to ObjectWeb's ASM (which doesn't stand for anything), and I'm quite happy with that decision. In a later post I will compare the two and explain my decision to stop working with one and start working with the other.

So far, any questions?

Related Posts with Thumbnails

12 Responses to “Implementation details for Java Yielder”

  1. Klavsie Says:

    Neat! I’m really looking forward to seeing the source :-)

  2. Avah Says:

    I am currently thinking on where to post the code. So far I found one bug, which I intend to fix before releasing the code, and one “issue” which creates slightly ugly code – Maybe I’ll make a post about it, to see what the community thinks.

  3. Chaotic Java » My TreeModelIterator Says:

    [...] Anyway, with the new Yielder option it looks like this: [...]

  4. Chaotic Java » Jakarta’s BCEL vs. ObjectWeb’s ASM Says:

    [...] In the third in the “yielder series” (1, 2), I will describe the differences I’ve seen between Jakarta’s BCEL and ObjectWeb’s ASM frameworks for bytecode manipulation. [...]

  5. Chaotic Java » How to write Iterators really REALLY fast Says:

    [...] Difference from dotNet is bolded. Not that different, is it? In the next post I’ll post the implementation details of this. On the other hand, if you’re excited and want to download it, please go to the project’s site. [...]

  6. Bernhard Says:

    Some time ago I had a similar problem to implement in Java. Because the data producer was rather complex I wanted to avoid converting it to a state machine. Therefore I choose a queue of length one item to hand over data from the producer to the consumer (like a UNIX pipe). As a consequence it was necessary to run the producer in a separate thread which might be expensive as a general solution but has been worth in this case.

  7. Avah Says:

    Bernhard – It’s a very good point. At first I thought of creating this feature using threads, queues and object monitor locks. However, since I started making this from the tree and graph iterators point of view, it seemed like a lot of threads were going to be created (as the calls to the yielder are somewhat recursive), which was not going to be worth it eventually.

    As a general rule, I agree that this feature could have been implemented using threads and queues. If there’s a request for it, I don’t mind adding a general solution which is JDK 1.4+ compliant that into the version, which doesn’t do bytecode manipulation at all.

  8. Chaotic Java » On local variables’ scopes and other problems Says:

    [...] In order to create the members of the yielding class (need a memory refresh?), the yielder (in the current, 0.1.1 version) is looking at the local variables map, and creates a member called variable-name$promoted for each local variable, with the type of the local variable. Matching between a slot access (for read or write, using the xLOAD or xSTORE bytecode operators) to a local variable is done using the scope mapping, also found in the debugging information. However, when debugging information is disabled, this information is unavailable, and thus the yielder should work differently. [...]

  9. Chaotic Java » Bug fixed, carry on yielding! Says:

    [...] The differences from the original implementation post relate to the local variable promotion only. The changes are as follows: [...]

  10. Chaotic Java » Blog Archive » Who wants Indexer as the next EoD feature? Says:

    [...] When Java 5 came out, many ease of development (EoD in short) features were introduced. One of them was the “foreach” loop – a syntax for iterating over arrays and collections from the collections framework easily and, as an extra effect, in a standard way. This was accomplished by extracting the iteration interface out of the Collection interface, now named Iterable. Everything that implements the Iterable interface could be used in a “foreach” loop (which is, by no coincidence, the implemented interface by my Yielder class, which is a virtual collection). [...]

  11. Dimitris Andreou Says:

    Talking about traversals, how would this approach support recursion? It sounds problematic even in the case of yielding from any deeper stack frame – is this understanding correct?

  12. Avah Says:

    Dimitris: No. The yielding supports tree traversal perfectly well, and in fact I already implemented tree and graph traversals using it – you can check out the proejct’s website to see how, the code is all there.