The previous example dramatically illustrates the change in the Smalltalk execution cost model that occurs in the Strongtalk system. This change in the cost model encourages programmers to write in a much more factored, high-level coding style. The Strongtalk class libraries take extensive advantage of this effect. They use instance variable accessor methods almost universally, and create extensive custom control structures through use of Blocks, even in the most performance critical spots in the system.
Here are two examples of this that will surprise people familiar with other Smalltalk implementations. In Strongtalk, the implementations of Array>>at: and Array>>at:put:, two of the most performance critical methods in the system, actually send another internal message that then invokes a primitive, rather than invoking the primitive directly. The extra send imposes no cost.
Dictionary>>at: is an even more extreme example. It is implemented in a highly factored, abstract way in the HashedCollection superclass of Dictionary, and sends over 20 real messages in the common case, not counting integer arithmetic and boolean control structures, and uses 3 copying blocks, one full block, and uses many other clean blocks (not counting blocks used with hard-wired boolean control structures). This would be wildly inefficient in other Smalltalk systems, yet running on the Strongtalk VM it is actually much faster than the hand-optimized implementations in other Smalltalk systems. As an experiment, I ported my Dictionary implementation to the VisualWorks VM to compare its performance, and it ran about 35 times slower than on the Strongtalk VM. This illustrates clearly the infeasibility of writing performance-critical code in this style without adaptive inlining.
On the previous page, we ran a benchmark, and I claimed that the VM was doing extensive inlining on the benchmark code. Now let's verify that by looking at the actual inlining structure generated by the adaptive compiler. Here is what the VM reports as the inlining structure for #notSoSimpleTest:
Test class::notSoSimpleTest: 11 Array class::new: 2 Array class::primitiveNew: 27 uncommon 31 Test class::fancyStoreIntoArray: 9 Test class::evaluateBlock: 2 BlockWithoutArguments::value 2 Test class->fancyStoreIntoArray: 4 6 Array::at:put: 3 Array::primitiveAt:put: 4 uncommon 6 uncommon 44 SmallInteger::+ 56 SmallInteger::>=
This text, while somewhat cryptic, lays out clearly the inlining structure. It is in inlining-database format, which I will give you a link to more information on below. You can ignore the numbers, which just indicate the bytecode offset in the method of the send that was inlined. The structure of each inlined method is recursively displayed in indented form. Thus, you can see on the second line that the (Array class::new:) method is inlined, and it in turn has an inlined send of (Array class::primitiveNew:). Note also that the Array::at:put: method actually contains an inlined send of primitiveAt:put:, as we mentioned previously.
You can see clearly that all the important sends in the benchmark are being inlined, all the way down to SmallInteger operations and primitives. You can also see that the block that we used is inlined (it is indented below the line with BlockWithoutArguments::value), so that no block closure needs to be allocated.
If you are interested in the inlining database (which is not turned on by default) or examining and controlling compilation, you can read some more details here before proceeding.