The secret, of course, is to eliminate dynamic memory allocations. This isn't a huge revelation to most programmers, especially not games programmers, but it's such a royal pain that it often gets left on the table unless and until things get truly desperate.
Well, after my 740ms run earlier today, I was well and truly ready to do anything it took to speed things up more.
So I tried something devious: I hooked into the function which allocates memory in the Epoch runtime, and wired it up to print out the name of the function that called it. (If you're curious how this works, it's just assembly hacks - the return address is already on the stack, so we just read it off and pass it to a helper function which knows how to translate that back to a code function name based on debug information.)
With a little command line magic, I piped all of the output of this process into a text file, and got a timeline of every single memory allocation performed by the compiler.
Well, that file is 27MB of text, with over 672,000 allocations. That might explain some of the slowness...
The dominating offender appears to be linked lists of integers. Not terribly surprising, considering how commonly I use integers for things (string handles, type IDs, etc.). Unfortunately, I can't readily see a good way to get rid of them all, so this might be a major sticking point until I can implement better containers.
Another annoyance is that much of the compiler is written to do destructive mutation, i.e. update variables and data structures in-place during execution. This immediately puts a stop to my next idea, which is to cache off pre-allocated objects of commonly used types, and recycle them from a pool.
Ironically, I spend several hours trying to eliminate allocations with various trickery, and consistently fail to gain any significant speed. At this point it looks like I might need to try to attack this again with a fresher mind.