In our project we use Lua as application code which calls into modules implemented either in Lua or in a lower level language. This works very well for us, but throughout the project we have had some issues dealing with Lua errors robustly. Because our Lua scripts are application-level logic, they routinely manipulate userdata provided by the lower-level modules; many of these structures hold system resources and need to be quickly finalized after we are done with them. We have close() functions in our code to this end, but, of course, if a Lua error is thrown, this doesn't happen until the garbage collector decides to run (we have correct __gc metamethods) which may be never since the default GC strategy is to only collect when a certain amount of memory is used. This is, naturally, unacceptable.
We don't abort on error because our Lua scripts run an event loop; we catch Lua exception at the callback boundary, log it, and usually resume the loop afterwards; the program is terminated only in the case of a catastrophic error, or if something goes wrong before starting the event loop (e.g. parsing a config file, ...).
So a few weeks ago I had a seemingly simple but apparently effective idea: override pcall with an implementation which, in case of an error, triggers the garbage collector; as far as I can tell this provides very strong guarantees about the finalization of resources acquired within a failed Lua chunk; since we don't use globals, the only accidental way for these resources to escape finalization is for them to be referenced in upvalues or table/userdata parameters of that chunk before it fails, which is easy to, like, not do. Just keep them in locals and either return them as you would normally, or lift them outside the chunk through its upvalues at the very end.
The problem I can't seem to find any references to this pattern; are there any downsides to this? Because as far as I can tell this is a free lunch:
- trivial to implement: override pcall, add lua_gc() calls on error. done
- ensures that all failed resources are immediately finalized in case of failure
- has zero overhead (and if no error occurs, our normal code path does the cleanup by itself)
Any thoughts? Any obvious defect I overlooked? Has anyone seen and/or done this before, and did it work for you?
EDIT: actually this isn't quite true; I did find exactly 2 references to this pattern, one on the Lua website (supposedly the vanilla Lua interpreter does the same thing in interactive mode when user input fails for some reason, but I can't find the code that does it in the Lua 5.3 distribution so maybe it doesn't do it anymore) and here: http://lua-users.org/lists/lua-l/2009-02/msg00191.html but it doesn't go into a lot of detail.