Small Implementation Change in Conditionals

Brat is pretty much all objects and method calls. There are no special keywords and only a few special characters. So true?/false?/null? are just methods and branches are just arguments to those methods.

Brat is also very eager, so the only way to delay evaluation is to put code inside of functions.

If one writes code like this:

true? something, do_this, else_do_that

Brat will execute something, do_this, and else_do_that then pass the results as arguments to true?. That’s probably not what we want!

Instead, we have to do this:

true? something, { do_this }{ else_do_that }

A while back we inlined branches if they were (inlineable) functions. This led to a pretty decent performance improvement.

But what about code like this?

true? something, "0""1"

In this case, the two strings will be generated regardless of the condition, as they are just arguments to the true? method. This is obviously wasteful. But putting the branches inside functions is also a bit cumbersome:

true? something
  { "0" }
  { "1" }

Feels a bit like overkill just to avoid allocating a string.

To address this, when the branches are any simple value (string, number, array, etc.), Brat will generate the same kind of inline if constructs as it would for inlineable functions.

Some Numbers?

Let’s look at some luatrace summaries.

Before

1000.times { i |
  true? i > 500"1""0"
}
TRACE SUMMARY
=============
Trace Status                       Traces       Bytecodes           Lines
------------                       ------       ---------           -----
Success                          6 ( 75%)      836 (  6%)      139 ( 52%)
loop unroll limit reached        2 ( 25%)    12271 ( 93%)      128 ( 47%)
------------------------- --------------- --------------- ---------------
Total                            8 (100%)    13107 (100%)      267 (100%)
========================= =============== =============== ===============

Before, Using Functions

1000.times { i |
  true? i > 500{ "1" }{ "0" }
}
    TRACE SUMMARY
    =============
    Trace Status           Traces       Bytecodes           Lines
    ------------           ------       ---------           -----
    Success              4 (100%)      379 (100%)      106 (100%)
    ------------  --------------- --------------- ---------------
    Total                4 (100%)      379 (100%)      106 (100%)
    ============  =============== =============== ===============

After

    TRACE SUMMARY
    =============
    Trace Status           Traces       Bytecodes           Lines
    ------------           ------       ---------           -----
    Success              4 (100%)      379 (100%)      103 (100%)
    ------------  --------------- --------------- ---------------
    Total                4 (100%)      379 (100%)      103 (100%)
    ============  =============== =============== ===============

What does this all mean?

Starting with the “Before” results, we see 6 traces were compiled by the JIT, but 2 hit problems that caused the JIT process to stop. Also, notice the large number of bytecodes affected in the “Before” results - only 836 successfully compiled, 12,271 did not.

Compare to using functions for the conditional branches. In the “Before, Using Functions”, all traces are compiled and it’s only 379 bytecodes. This is because the true? function call is gone and replaced with an if statement with inlined branches instead of function calls.

Now, in the “After” results, we see it lines up with the result from wrapping the branches in functions, except we don’t have to do that extra bit of work or worry about a performance hit. Hooray!

How much faster?

For the code above,

Before: 0.020s

After: 0.010s

While that is a 50% improvement, it’s probably not going to make any particular program go that much faster. Instead, it just removes that little bit of performance concern with simple arguments to true?/false?/null?.

What Happened in 2017?

Oops, 2017 went by without a blog post! What happened?

Not a lot…

  • The interactive Brat shell works again!
  • More unboxed operations on numbers
  • Single-quoted backslashes finally parse correctly
  • More tracking of types during compilation
  • Symbols (:blah) are now immutable, again
  • Create prototype objects on demand
  • This works: x.y = {}()
  • Fix nested comments and semicolons in comments
  • Updated LuaJIT

Overall, Brat is another year older, and a little bit faster!

Inlining Branches and Sharing Metatables

In this post, we continue the quest to make Brat faster (which essentially equates to making LuaJIT happier).

Inline Branches

Previously we removed inner function creation by lifting functions out into the global scope and faking closures when needed.

One place where a lot of functions show up is in conditionals. true?/false?/null? are all just functions that take a condition and two branches. In the Tak benchmark, you can see the branches are wrapped in functions:

tak = { x, y, z |
  true? y < x
    { tak tak(x - 1, y, z), tak(y - 1, z, x), tak(z - 1, x, y) }
    { z }
}

This is standard Brat style to delay execution of the branches.

Before the lifting of functions, these functions would be created everytime the tak function was called. That’s pretty bad! Now the functions will be lifted and fake closures will be used instead.

However, what’s better than lifting a function? Not calling a function at all! Since the conditionals are used all the time and are core functions in the language, it makes sense to optimize them to just be regular Lua if statements.

When can we do this? Any time the branch is a liftable function. That’s convenient, since we already have the logic to figure that out.

To inline the branches, they are treated almost exactly like functions. A new scope is created and what would have been the body of the method is output in a do...end block. Instead of a return value, the result is just passed back in a variable. The condition and the branches are then put into a reguar if statement with guards just in case someone decides to override true?/false?/null? (which is possible but unlikely. If it happens, the original code without inlining is used.)

What are the results?

Tak benchmark before inline branches: 0.751 seconds

Tak benchmark after inline branches: 0.431 seconds

A nice 43% improvement!

Shared Metatables

Metatables are Lua’s way of overriding behavior of a table. For example, you can set the method to be called when brackets [] are used, or what method to use for conversion to a string. Brat sets up these two methods for every new object.

In wandering the web looking for nuggets of LuaJIT wisdom, I found this email from Mike Pall. In it, he notes that the parent post was creating new metatables for each new object, but the methods were the same.

Looking at Brat’s object creation, it already factored out the methods, but a new metatable was created for each new object. It was a simple change to always use the same one, and the change had no ill side effects on existing code.

Results

Tak benchmark after metatable change: 0.225 seconds. Another 48% improvement! These two changes together reduced the runtime for the Tak benchmark by 70%.

Similarly, the Kaprekar benchmark went from 86 seconds in our last blog post to just 21 seconds - another 70% improvement. Fibonacci (king of microbenchmarks) runs in just 0.043 seconds.

For more real-world use, these two optimizations reduced parsing time of peg.brat (ironically the current largest Brat file) by 42%.

While Brat is still not (nor will ever be) particularly fast in general, it is fun to continue pushing it.

More posts…

Fork me on GitHub