Simple Performance Improvement to Array Creation

Let’s ease back into the world of Brat with a recent improvement to array creation.

A long time ago I started working on an implementation of Brat’s parser in Brat as a way of removing Brat’s dependency on Ruby. I stopped, however, because it was way too slow. But that was a long time ago! So I started looking at it again, and it is much faster than I remember it being, likely due to LuaJIT improvements.

In any case, I decided to see if there were any small improvements to be made. Lately, my starting point for Brat performance has been to run luatrace and look for code causing LuaJIT’s traces to abort. Aborted traces indicate code paths that LuaJIT tried to compile to machine code, but failed due to code not supported by the compiler.

I compiled Brat’s grammar using Brat’s PEG library to Brat, then tried parsing the largest Brat file I could find…which was Brat’s PEG library. All kinds of recursion going on here.

The very first bit of output from luatrace looked like this:

TRACE SUMMARY
=============
Trace Status                            Traces       Bytecodes           Lines
------------                            ------       ---------           -----
Success                              79 ( 37%)     1679 (  6%)      478 (  7%)
NYI: bytecode FNEW                   92 ( 44%)    20046 ( 77%)     4620 ( 76%)
NYI: C function 0x41b48938            2 (  0%)      946 (  3%)      161 (  2%)
NYI: bytecode TSETM                   5 (  2%)      883 (  3%)      216 (  3%)

This means 37% of the traces attempted by LuaJIT succeeded (yay!). But 44% were aborted because of function creation (FNEW), which is pretty hard to avoid in Brat. Brat is function-happy. The third line indicates two traces were aborted because of calls to some C function (in this case, lrexlib which provides access to Oniguruma for regular expression support).

But the fourth line says a few traces were aborted due to “TSETM” (oh, and “NYI” means “not yet implemented” in the JIT compiler). I’ve found bytecode aborts are the easiest to work around, so I that’s where I started. LuaJIT’s wiki has a handy page on NYI. It says “TSETM” is the bytecode to “initialize table with multiple return values” and it’s supported in LuaJIT 2.1. Unfortunately, 2.1 is not even in beta yet. But maybe we can avoid using it? A quick search through luatrace’s output indicated the offending code was in array:new, the function used to create new Brat arrays.

If you are unfamiliar, nearly all of Brat’s implementation is in core/core.lua. Here is the whole array:new function, with my annotations:

function array:new (first, ...)
  -- Setup new Brat object
  local na = new_brat(self)
  na._prototype = new_brat(object)

  -- If there is exactly one argument and it's a Lua table (not a Brat object)
  -- then it's Lua code attempting to create an array. Just put it inside the
  -- Brat object and we're good to go.
  local rest_len = select("#")
  if rest_len == 0 and type(first) == "table" and not first._is_an_object then
    na._lua_array = first
  else
    -- Otherwise, put all the objects into the array
    na._lua_array = {first, ...}
  end

  -- Set the Brat array length and return it
  na._length = #na._lua_array
  return na
end

First of all, this function is a little strange. It accepts a variable number of arguments because you could call array.new in Brat. However, this same method is used by generated Lua code. In the generated code, it passes in a single Lua table to initialize the array. So why that weird first argument? Turns out that was also an optimization to avoid “TSETM” last July!

The code used to look like this:

function array:new (...)
  local na = new_brat(self)
  na._prototype = new_brat(object)

  local args = {...}
  if #args == 1 and type(args[1]) == "table" and not args[1]._is_an_object then
    na._lua_array = args[1]
  else
    na._lua_array = args
  end

  na._length = #na._lua_array
  return na
end

Note the use of {...} which causes the “TSETM” abort happens no matter what. This was slightly improved in the newer version above, but it’s still a problem.

In particular, it’s a problem when creating an empty array. When this function is called with no arguments, it still tries to put all the arguments into a new Lua table!

Here’s the improved version:

function array:new (first, ...)
  local na = new_brat(self)
  na._prototype = new_brat(object)

  -- Create empty array
  if first == nil then
    na._lua_array = {}
    na._length = 0
    return na
  end

  if type(first) == "table" and not first._is_an_object and select("#") == 0 then
    na._lua_array = first
  else
    na._lua_array = {first, ...}
  end

  na._length = #na._lua_array
  return na
end

Now it explicitly checks for the empty array case and avoids using {...} and #na._lua_array.

Does it matter? Let’s find out by creating a million empty arrays:

1000000.times {
  []
}

Averaging over ten runs the results are:

Before: 1.53s

After: 0.53s

Wow! A whole second faster, which is a 65% speedup! Not bad.

These types of optimizations are pretty fun because they don’t take much work and suddenly Brat is compiled to assembly! I got pretty excited a while back when some Brat code was faster than Ruby (keep in mind that the Ruby method is implemented in C!)

comments by Disqus
Fork me on GitHub