The biggest hitch is performance, and the understanding of the workings of the language required to scrape the last possible fps out of a rendering sequence. Of course, a language such as python isn't going to get the performance that a 'lower' level language such as C++ can get, especially when dealing with something like OpenGL which so often gains it's largest performance boosts from bare-metal array and pointer based programming. In Python, I am currently running into a large number of unforeseen hitches in trying to optimize a very simple tilemap rendering routine; even with Numeric or numarray, there are still weird things going on (copying, memory allocation, who knows what else; this is where my lack of understanding of the deepest fundamentals of Python really bites me in the ass) that have a major impact on our framerate.
One annoying little issue cropped up yesterday, and is still in a state of only semi-resolution. We have a separate texture class that handles loading a .TGA file and binding it as an OpenGL texture. In the main map rendering loop, the tileset texture is bound and tiles are drawn by using UV coords to snip out squares of the texture. Simple enough, works like a charm, but our frame rate takes a 5x hit simply by changing the line "from Texture import *" to "import Texture" and using fully-qualified identifiers to work with textures. Why? Who knows.
Both Fruny and I have pored over the code, trying a dozen iterations and revisions. It may even be a problem with the Linux implementation of Python, since Fruny doesn't experience the problem to anywhere near the degree that I do. That simple difference, for me, can mean a change from ~50FPS down to ~8FPS. That such a drastic difference occurs serves only to indicate how little I really grok what is going on in layers of the language below the surface.
And on the issue of FPS, even without this annoying little issue, 50FPS is horribly bad for what we are doing. A simple tile map, one texture bind to set the tileset. True, we are using immediate mode calls (glVertex, glTexCoord, etc) to draw the tiles, but even so, drawing a 14x7 or so block of 64x64 pixel tiles(a screen's worth, in 800x600 mode), even with the simple vertex lighting calculations we are performing, should get far better performance on my GF6800 than 50FPS. I know for a fact that I can achieve rates in the ballpark of 300FPS doing the exact same thing using C++. How do I know this? I've done it before, using pretty much the same general methodology as this Python experiment uses. Again, there are some behind-the-scenes performance hits going on that I lack the understanding to try to optimize out. I performed some refactoring this very evening that I thought was going to speed things up, but which in actuality lost me another ~10FPS on average.
Bear in mind, too, that this is with only test sprite entities that all bind the same sprite sheet. And there are only 300 test entities spawned on a map 193x193 tiles. At any given time, only 6 or so are every drawn on the screen at once. I shudder to think what the performance will be like when more and varied entities are added, with AI and game mechanic calculations thrown in to the mix.
I'm going to try using arrays and indexed primitives rather than immediate mode calls and see if that helps. I've been avoiding it, due to the fact that each tile will need 4 unique vertices (effectively quadrupling the size of the map) and also due to the fact that having each vertex in the grid duplicated 4 times could impose more performance penalties on the lighting calculation. But even without the performance penalties I introduced this evening, there still just isn't enough framerate to spare. We haven't even started AI; hell, we haven't even implemented an update loop to move those 300 or so test entities around; they just sit there.
I'm really starting to understand why so often people will implement engine code in C++ and only implement superficial 'glue' code on top of the engine in Python. This is really very frustrating. For games up to a certain level of complexity, it is just fine, but try to push beyond that and lots of little hidden issues rise up to wreak havoc.