performance hit when accessing array in large loop

Started by
11 comments, last by trs79 12 years, 6 months ago

Like alvaro said, if you comment out that "culprit line", everything else will be removed also by the optimizer, so the whole inner loop is the actual culprit.

There you have 375k 'floor' ops by converting to int and then multiplying that int by a float. That is not a good thing. Try replacing it with 'floor' and see if you get any difference. Floor isn't free in general though.
Try enabling SSE2 if you are compiling for 32-bit, as faster int/float conversion ops are used then. I'm not sure how many of those the compiler is allowed to optimize away. Try to not do arithmetic with both integers and floats together.

In addition you have 375k int * float operations from 'x/y/z * stride'.

Then you have like a million or two adds and subtracts, same for multiplies, and 250k branches.


Interesting, I hadn't realized that was such an expensive operation but that makes sense, I'll try changing it so ints and floats aren't used together. Yeah this whole functions seems very expensive, any other ideas on how to remove the O(N^3) complexity? I guess maybe that's just the nature of SPH fluid surface generation. I've tried to think of ways around that but am stumped. Currently it looks like the compiler is set to use SSE2 (I'm using MSVC with a /arch:sse2 compiler switch) dissassembly shows use of the xmm0 register.
Advertisement
Couple of points:

  • Timing in FPS is pointless. It's counterintuitive because you're dealing with the reciprocal of the actual speed value, which means it follows a weird curve instead of a nice easy-to-understand linear pattern. Always talk about timings and performance in terms of how many milliseconds it takes to do some operation, not in terms of how many FPS you have. That was the point of the linked article from the beginning of the thread.


  • You should pick up a profiler (I like Very Sleepy and Luke Stackwalker, personally; both are free) and see what it says. Guessing about performance bottlenecks is risky business. Even the best assembly hackers can't always look at a program and tell you where its performance issues will be. So the conventional advice is, if you're talking about performance, you should have some profiler numbers to prove out your assumptions about what's actually slow. In this case, the float/int/float conversion paths are probably hurting you the most, but you might discover other interesting things about the code if you throw a good profiler at it.


Good luck!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thanks for the info and profiler links, I'll give those a shot. I can't believe it didn't don on me that the compiler was optimizing away the loops, thanks everyone for showing me that!

This topic is closed to new replies.

Advertisement