Ray casting walls and floors - speed needed!

Started by
5 comments, last by j-dog 12 years, 2 months ago
Hi!

I recently wrote a simple ray casting engine, which casts textured walls, ceilings and floors. This is written in Python, with the help of Pygame, and is this first bit of real Python work I have ever done - so it's a bit of a learning exercise for me. I'm about to add a sprite system and following that, I want to turn it into some kind of silly little dungeon-explorer game.

Unfortunately, it's just too darn slow. I cannot bring myself to enhance the functionality any further until I can get a lot more speed out of what's already there. I have a bit of an OOP approach which might not be the fastest, but it's categorised neatly. I'm willing to do a lot to the code, but I don't want to attack it too much until I get my facts straight on how to choke the most speed out of this.

I won't post the code itself (unless needed), but the drawing works pretty much as described below:

  • For every vertical pixel column, a ray is cast for wall intersections...
  • ...this ray checks for horizontal and vertical intersections only at grid intersections (as described by Permadi[color=#009933][font=arial, sans-serif][size=1])[/font]
  • Wall height is calculated for each ray, and the correct texture is scaled and drawn to screen.
  • Floor / ceiling distances are pre-computed by rays cast downwards, after which each floor pixel is found using linear interpolation and then is drawn vertically, going downwards for every pixel after a wall slice till the bottom of the screen. This coordinate is simply mirrored to the top half of the screen to draw the ceiling.

Now, I think that maybe some aspects of these algorithms could stand to be improved, but not as much as I think the Python implementation could improve. I think I'm doing some expensive things, but I'm not too sure of what would give me a REAL speed boost. I've run a profiler, and not surprisingly, the raycasting itself and the drawing of screen pixels cost the most. The following are of concern:

  • I wrote an angle class which clamps the rotation angle (and all angles) between 0 and 360. I've used __add__ and __sub__ to handle this - obviously there are a lot of calls to this - should I avoid them and do this differently?
  • I am using a *lot* of list accesses. For example, my "castRays" function goes through all vertical columns and then stores results in a list... such as: wallDistances = distance. Are such operations slow? If so, they will kill me because I do a ton of them. Any alternatives?
  • Regarding the lists again - I've heard that numPy can improve the speed of such things. I do not really understand why or how, but is this true?
  • Calls to functions or objects are apparently quite slow in Python - but from a bit of tweaking I haven't really seen that much difference. Is calling a "castRay" function really so expensive in python? And, how much is this influenced by passing parameters and returning values?
  • Regarding pygame: I'm making use of surfaces to store all textures as well as what's displayed on screen. I got a significant performance boost by using pixelArray (and setting those values) in place of set_at(x,y). Is there an even faster way to do this? Surfarray maybe?
  • Regarding walls only: I make use of a surface, which I then crop using subsurface (to get only the vertical column), and THEN rescale using transform.scale. Then I convert to a pixelarray. Eep! Is there a better way? Scale seems a bit silly for my purposes since I need to scale in 2 dimensions when I only need to do it in 1(it will not allow me to scale to width "1" either).
  • Is there much value in writing C bindings for some of the most common tasks? What about Psycho?

Apologies for the long post, but thank you for reading - I'm currently averaging about 8fps, which is pretty much unplayable. The floor casting is especially pricey but walls could use some work too. I know I'm asking a lot here, but I feel a bit directionless and would love to get some good feedback before butchering my code - I've already tried to take small steps at optimization with little or minor success.

Any help would be greatly appreciated!!
Advertisement
I'd probably try to work on the floor and ceilings first, since I guess that's what's eating most of the time (unlike walls, where you can use the built-in transform functionality, with the floor and ceilings you have to calculate each pixel manually, and that's most likely eating a lot of time). If you can find out a way to cheat on those calculations too, by all means do it.

And yeah, moving the drawing to C will most likely help, though it depends how easy you find doing that.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

Hi!

I recently wrote a simple ray casting engine, which casts textured walls, ceilings and floors. This is written in Python, with the help of Pygame, and is this first bit of real Python work I have ever done - so it's a bit of a learning exercise for me. I'm about to add a sprite system and following that, I want to turn it into some kind of silly little dungeon-explorer game.

Unfortunately, it's just too darn slow. I cannot bring myself to enhance the functionality any further until I can get a lot more speed out of what's already there. I have a bit of an OOP approach which might not be the fastest, but it's categorised neatly. I'm willing to do a lot to the code, but I don't want to attack it too much until I get my facts straight on how to choke the most speed out of this.

I won't post the code itself (unless needed), but the drawing works pretty much as described below:

  • For every vertical pixel column, a ray is cast for wall intersections...
  • ...this ray checks for horizontal and vertical intersections only at grid intersections (as described by Permadi[color=#009933][font=arial, sans-serif][size=1])[/font]
  • Wall height is calculated for each ray, and the correct texture is scaled and drawn to screen.
  • Floor / ceiling distances are pre-computed by rays cast downwards, after which each floor pixel is found using linear interpolation and then is drawn vertically, going downwards for every pixel after a wall slice till the bottom of the screen. This coordinate is simply mirrored to the top half of the screen to draw the ceiling.

Now, I think that maybe some aspects of these algorithms could stand to be improved, but not as much as I think the Python implementation could improve. I think I'm doing some expensive things, but I'm not too sure of what would give me a REAL speed boost. I've run a profiler, and not surprisingly, the raycasting itself and the drawing of screen pixels cost the most. The following are of concern:

  • I wrote an angle class which clamps the rotation angle (and all angles) between 0 and 360. I've used __add__ and __sub__ to handle this - obviously there are a lot of calls to this - should I avoid them and do this differently?
  • I am using a *lot* of list accesses. For example, my "castRays" function goes through all vertical columns and then stores results in a list... such as: wallDistances = distance. Are such operations slow? If so, they will kill me because I do a ton of them. Any alternatives?
  • Regarding the lists again - I've heard that numPy can improve the speed of such things. I do not really understand why or how, but is this true?
  • Calls to functions or objects are apparently quite slow in Python - but from a bit of tweaking I haven't really seen that much difference. Is calling a "castRay" function really so expensive in python? And, how much is this influenced by passing parameters and returning values?
  • Regarding pygame: I'm making use of surfaces to store all textures as well as what's displayed on screen. I got a significant performance boost by using pixelArray (and setting those values) in place of set_at(x,y). Is there an even faster way to do this? Surfarray maybe?
  • Regarding walls only: I make use of a surface, which I then crop using subsurface (to get only the vertical column), and THEN rescale using transform.scale. Then I convert to a pixelarray. Eep! Is there a better way? Scale seems a bit silly for my purposes since I need to scale in 2 dimensions when I only need to do it in 1(it will not allow me to scale to width "1" either).
  • Is there much value in writing C bindings for some of the most common tasks? What about Psycho?

Apologies for the long post, but thank you for reading - I'm currently averaging about 8fps, which is pretty much unplayable. The floor casting is especially pricey but walls could use some work too. I know I'm asking a lot here, but I feel a bit directionless and would love to get some good feedback before butchering my code - I've already tried to take small steps at optimization with little or minor success.

Any help would be greatly appreciated!!


Numpy is quite efficient when dealing N dimensional matrix/list operations; it may help you a lot. Numpy is fast because a good portion of the library is implemented in C.

Check the docs for more info: http://numpy.scipy.org/
Haven't tried it (but would like to at some point), but you could look at PyOpenCL.

Also, I parroted this in another thread recently, too:


import cProfile
cProfile.run('main()')


Even more also: http://www.python.org/doc/essays/list2str.html
You could look into the algorithm described here: http://lodev.org/cgt...raycasting.html

It seems a lot simpler than my version, although I can run mine with 800 vertical slices at around 5000FPS >:). The calculations involved in the article linked to above seems to be a bit faster than what Permadi describes (the one I implemented)
Follow and support my game engine (still in very basic development)? Link
That page says nothing about rendering textured floors and ceilings though, and I'm going to guess that's the main bottleneck here.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.
Hello again, been on holiday so bit of a delay there...

Thanks for all the replies! I've had some time to tinker around and try force more performance out, and here's what I found...

The biggest bottleneck is certainly the floor and ceiling drawing, and not surprisingly so - after all, this is a per-pixel operation. Walls were much faster but also a bit too slow, and I managed to speed up the actual drawing by a great deal and to a very acceptable level with small adjustments in the Python code alone.

I tried to change the wall/floor algorithms somewhat but this was of little help. They were already trimmed down anyway and I couldn't change too much. Reducing function calls helped a little bit, but not by a great deal. Then I tried numPy which (although I found it to be a very cool module) did not help speed - I wasn't doing operations on arrays so much as I was using them to store and transfer values.

I gained a solid performance increase to wall drawing by using the surface blit function for all the wall slices as opposed to drawing individual pixels. The blit actually draws more pixels (due to scale implementation) but it's still much faster - seems like the Pygame modules are pretty well optimised.

I guess in the end, after a lot of tweaking, I realized that I had to take steps towards compliation - no matter what I did, the performance gains (especially for floors) were pretty much negligable at best, and at other times, actually decreased performance. Psyco was one option I wanted to try but unfortunately it doesn't support python 2.7 which I am using, and I don't think it justified the downgrade. Also looked into PyPy and Pyrex, but in the end I settled on Cython.

Cython allowed me to copy-paste much of the code (though I still needed a good deal of refactoring) without having to write a lot of C code. I have kept all my pixel drawing (which does a minimum of work) to the Python implementation, while all ray casting calculations happen in the Cython module. By doing this, I've boosted my average FPS to 20 - which still leaves a lot to be desired, but it's a start!

This Cython code is still highly unoptimised - it's close to 300 lines of what are essentially python method calls. I'm going to fit in as much early binding and C-arrays as possible, maybe fit some PyOpenCL in too. But now I feel like I have a lot of leverage with which to attain speed! smile.png

raycast.jpg
... yeah I'm using the wolfenstein 3D textures for now!

This topic is closed to new replies.

Advertisement