But there is some missing data here. How many FPS did this wizardry gave to your game? And how many days were added to the project by choosing a language that allows you to get to that kind of wizardry? Are we sure that that FPS improvement is worth more than the delay to the game release?
The biggest optimization at this level of the code-base took some typical C++ code that was taking 8ms and reduced it's cost down to just 0.5ms (and that's without using any parallelization, which was also possible) -- taking us from well <30fps to comfortably >30fps, which is all that mattered, as we were vsync'ed to 30hz.
Just the core engine routines were written at this level of C++, by a very small team of expensive C++ programmers. The actual game itself was written by a much larger team in Lua (due to the productivity benefits!), with a budget of 16ms of CPU time on the main core per frame for all Lua code. Whenever this budget was breached (which happened often), some expensive Lua code would be ported over to optimized C++ code instead. These optimizations weren't delaying the release -- they were necessary to be able to release a playable product at all!