• entries
    134
  • comments
    272
  • views
    173164

Optimization and Worlds of Fun

Sign in to follow this  
dbaumgart

210 views

Note that my new posting schedule is Tuesday-Friday.
As ever, I'll try to have something new and interesting to talk about for every post rather than ranting about how everything sucks.

And a hearty thanks to everyone who tried out Isostrat last week, such feedback is invaluable.

Issues brought up, Answers to them


  • Save/Load buttons don't work

  • I haven't implemented this yet. I've been working on getting the map into a savable state by moving all the rendering functions from the map class to the video class (where they should have been anyway). So: unimplemented.

  • The "????" button doesn't do anything

  • I'm removing it now in a re-work of the editor interface: fixed.

  • Far ends of the map sometimes don't work

  • It turns out that parts of the map were thought to be 'off map' by the game. I found a place where I was comparing an X coord to a Y coord and a Y coord to an X coord -- ridiculous! But now fixed.

  • Clicks on Minimap don't recenter GameView

  • I've been putting this one off because it requires me to code some unique behaviour into a GUI component. But why not? It's my game, after all, I can make something with unique functionality. I'll get around to it as I expand the functionality of the GUI (which is going to have to happen a lot more to get gameplay working properly): unimplemented.

  • Mouse-drag tile-painting in the editor is sporadic

  • As in, a few tiles show up painted, but there isn't a swath of, say, green grass between the start and end point of a mousedrag. As David O suggests, I'd do well to draw a line between points of mousedragging and paint every tile that collides with it -- and improve framerate generally: in-progress?

  • Needs better scrolling

  • Yeah, the map needs to scroll on right-click drag (rather than kinda painting random tiles, if in edit mode). This should be very easy to do: unimplemented.

  • Editor needs an eraser tool

  • I didn't even think of this. Or I did a long time ago, but forgot with all the other stuff to do. It took no time to implement because I had code to do most of this already done: fixed.

  • Major slowdown at startup

  • Turned out this was due to blitting Borderstyle pieces tens of thousands of times.

  • General slowness

  • ...was due to the previous, and a horribly inefficient FindVisibleTiles function that was called on every render.

These both deserve their own sections, so:

Blit less, Blit bigger



Now this was interesting.
I'll talk more about how I found out that Borderstyle drawing was taking so long in the next section, so suffice to say: Borderstyle drawing was what was taking so long. And it makes sense, really, because to make the border for the Gameview I'd be blitting a little 4x4 image across the top and bottom, and down the left and right to make all the sides. Worst of all was blitting a 4x4 image of pure colorkey across a 796x596 field: 29651 (or so) blit calls for nothing that couldn't be done with a single fill.

The problem here is not necessary blitting for nothing but, rather, a process that increases quadratically -- in very rough terms, as the dimensions of a panel grew by x, I'd have to do x^2 blits. I solved my blit calls growing quadratically by increasing my blitting image quadratically as well to fit the area to-be-blitted by calling this function:


def makeSurfQuad(surf):
''' Make a new surface from four of given surf, in a square'''
surfquad = pygame.Surface( (surf.get_width() * 2, surf.get_height() * 2 ) )
for x in range(2):
for y in range(2):
surfquad.blit( surf, (x * surf.get_width(), y * surf.get_height() ) )
return surfquad



... which takes a surface and returns that surface squared, and the Borderstyle does this until the surface could be blitted across the width of the panel in 2 blit calls. By doing this, about 30,000 blit calls are reduced to about 30 blit calls. I made one for linear blits as well with less dramatic but still valuable savings in blit calls.

The lesson: A blit call has much more overhead than a hell of a lot of pixels in a blit call , so: Blit less, Blit bigger. Now I need to do this with terrain rendering.

And if a problem increases exponentially, increase the answer exponentially.
(Not that you can for all situations, but it sure sounds clever.)

Finding bottlenecks and FindVisibleTiles



First the bottlenecks.
A million thanks to Oluseyi for pointing out Python Call Graph to me. By giving me a tool that can bring my attention to processing bottlenecks, he has, well, indirectly lead to me learning a hell of a lot about programming (or so it feels over the last few days).

So: Python Call Graph makes graphs of Python programs and tells the number of calls and time spent for each function. Now this is very interesting. When I first ran this three places stood out as taking the most time. In order they were Borderstyles, which I've discussed, FindVisibleTiles, which I will discuss, and a far third was David O's vector.add function, which I'm going to let slide because it gets called a few ten thousands of times more than any other function. Oh, and tile.update is up there too I suppose, but that too is called like a hundred thousand times in a short period of time, and it doesn't really do anything yet except make sure that animation is working. Maybe I could only call that update on tiles which are visible.

Here's a shot of FindVisibleTiles being slow:


At first I tried re-arranging the deck chairs on the Titantic and got this:


Which was even slower than before. What I was doing was calling updateDrawPos, which found the absolute pixel position the tile was to be drawn at for every tile, then filtering out tiles that didn't fall within the Gameview area (plus a little). While making up new functions to obfuscate the problem I found myself only pulling tiles from those rows in the map matrix that the Gameview could possibly see, then I thought -- why not clip columns too? Why not just take a slice out of the map matrix of the tiles I know I will be able to see based on transforming the Gameview offset from (0,0) to an offset in the map matrix, and the size of the Gameview to a size in tile rows/cols in the matrix, so then:


for n in range(viewtop,viewrows):
viewTiles.extend( self.isomatrix[n][viewleft:viewcols] )
return viewTiles





3500 times faster. And I only call updateDrawPos in the renderer where, you know, it's actually needed.

In conclusion:


Optimizing was fun. I get this thrill from making things faster and more efficient. And I was only doing it to avoid thinking to hard about what to do next.

Here's the full program chart of the more or less current build for those who want their brains to explode:

Sign in to follow this  


2 Comments


Recommended Comments

Quote:
Original post by dbaumgart
and a far third was David O's vector.add function...
[oh] Eep!

Quote:
Original post by dbaumgart
...which I'm going to let slide because it gets called a few ten thousands of times more than any other function.
[lol] Whew!

Here's the latest version of the vector code in case you want to upgrade, although the functions are lowercase in your version. Its not much different, but the TransformBinary function got slimmed down a bit:

import itertools # repeat

import math # sqrt
import operator # abs, add, div, inv, mod, mul, neg, pos, sub

from . import pow2 # Pow2{,Ceil,Floor}

# unary arithmetic operations
def Abs(a): return TransformUnary(a, operator.abs)
def Float(a): return TransformUnary(a, float)
def Int(a): return TransformUnary(a, int)
def Inv(a): return TransformUnary(a, operator.inv)
def Neg(a): return TransformUnary(a, operator.neg)
def Pos(a): return TransformUnary(a, operator.pos)

# binary arithmetic operations
def Add(a, b): return TransformBinary(a, b, operator.add)
def Div(a, b): return TransformBinary(a, b, operator.div)
def Mod(a, b): return TransformBinary(a, b, operator.mod)
def Mul(a, b): return TransformBinary(a, b, operator.mul)
def Sub(a, b): return TransformBinary(a, b, operator.sub)

# unary vector operations
def Content(a):
return reduce(operator.mul, a)
def Cross(a):
return [-a[1], a[0]]
def Len(a):
return math.sqrt(Dot(a, a))
def Norm(a):
return Div(a, Len(a))

# binary vector operations
def Dot(a, b):
return sum(map(operator.mul, a, b))
def PerpDot(a, b):
return Dot(a, Cross(b))

# swizzle
def Swizzle(a, i, j):
return [a[i], a[j]]

# power-of-two
def Pow2(a): return TransformUnary(a, pow2.Pow2)
def Pow2Ceil(a): return TransformUnary(a, pow2.Pow2Ceil)
def Pow2Floor(a): return TransformUnary(a, pow2.Pow2Floor)

# transformation algorithms
def TransformUnary(a, func):
if not hasattr(a, '__iter__'): return func(a)
return [func(a) for a in a]
def TransformBinary(a, b, func):
aIterable = hasattr(a, '__iter__')
bIterable = hasattr(b, '__iter__')
if not aIterable and not bIterable: return func(a, b)
if not aIterable: a = itertools.repeat(a)
elif not bIterable: b = itertools.repeat(b)
return [func(a, b) for a, b in zip(a, b)]

Good job with your optimization work! I'm going to have to try out that call-graph tool sometime too.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now