I was strongly suspicious of the inliner in my last post, and it turns out that this hunch was (albeit indirectly) completely correct.
The obvious thing to do when facing a bug like this is to compare the code generated; dump out a listing of the version that works, and a listing of the version that doesn't, and run a diff tool to see what's mismatched between the two.
Thankfully, LLVM comes with a handy dump() function that lets us easily see what the IR bitcode looks like for a given program, so it's trivial to arrange this.
Combing through the diff, I noticed that there was indeed some inlining going on - but not of the functions that I suspected were causing problems. Moreover, suppressing inlining on the functions I did think were problematic made no difference!
As I looked closer, I noticed another interesting distinction between the working and broken versions of code: the @llvm.lifetime.start and @llvm.lifetime.end intrinsics.
It took some digging on the googles to figure out what exactly these mean. Semantically, they just define when two variables (supposedly) do not overlap in lifetime, and can theoretically be given the same stack slot. Except if we marked one of those stack slots as containing a GC root... well, I'm sure you can figure out the rest.
The intrinsics are inserted by the optimizers at the IR level but not paid attention to until native code emission phases, so all I needed to do was strip them out of the IR. Thankfully I already have a late-running pass over the optimized code for GC setup purposes, which is actually part of what generates the stack maps in the first place. I modified this pass to obliterate calls to those intrinsics, re-enabled function inlining everywhere I had tried to club it to death previously, and...
Incidentally, there are already bugs filed against the so-called "Stack Coloring" pass which uses these intrinsics. One that actually helped me track this issue down is filed at http://llvm.org/bugs/show_bug.cgi?id=16095
I'm pondering filing a second bug just to note that stack coloring/slot lifetime in the presence of GC roots is a bad idea.