Back to General and Gameplay Programming

performance hit when accessing array in large loop

trs79 · 2011-10-13T22:05:59

Hi all, I have 4 nested loops in a C++ function such that there are 125,000 total iterations. In the innermost loop I do a simple write to a float array[5], i.e. float array[5]; for (uint a = 0; a < 1000; a++) { for (uint b = 0; b < 5; ++b) { for (uint c = 0; c < 5; ++b) { for (uint d = 0; d < 5; ++b) { unsigned int ctr = 0; float v = 23 * a * b; array[ctr] = v; } } } } This code is part of a larger 3D test. If I comment out the "array[ctr] = v;" line, I get 600fps, if not I lose over 200fps and drop down to ~350 fps. Could this be due to CPU cache issues? I'm really stumped on this one, thanks for any help.

General and Gameplay Programming Programming

Started by trs79 October 13, 2011 07:27 PM

11 comments, last by trs79 12 years, 6 months ago

trs79

126

Author

October 13, 2011 08:55 PM

Like alvaro said, if you comment out that "culprit line", everything else will be removed also by the optimizer, so the whole inner loop is the actual culprit.

There you have 375k 'floor' ops by converting to int and then multiplying that int by a float. That is not a good thing. Try replacing it with 'floor' and see if you get any difference. Floor isn't free in general though.
Try enabling SSE2 if you are compiling for 32-bit, as faster int/float conversion ops are used then. I'm not sure how many of those the compiler is allowed to optimize away. Try to not do arithmetic with both integers and floats together.

In addition you have 375k int * float operations from 'x/y/z * stride'.

Then you have like a million or two adds and subtracts, same for multiplies, and 250k branches.

Interesting, I hadn't realized that was such an expensive operation but that makes sense, I'll try changing it so ints and floats aren't used together. Yeah this whole functions seems very expensive, any other ideas on how to remove the O(N^3) complexity? I guess maybe that's just the nature of SPH fluid surface generation. I've tried to think of ways around that but am stumped. Currently it looks like the compiler is set to use SSE2 (I'm using MSVC with a /arch:sse2 compiler switch) dissassembly shows use of the xmm0 register.

ApochPiQ

23,138

October 13, 2011 09:46 PM

Couple of points:

Timing in FPS is pointless. It's counterintuitive because you're dealing with the reciprocal of the actual speed value, which means it follows a weird curve instead of a nice easy-to-understand linear pattern. Always talk about timings and performance in terms of how many milliseconds it takes to do some operation, not in terms of how many FPS you have. That was the point of the linked article from the beginning of the thread.

You should pick up a profiler (I like Very Sleepy and Luke Stackwalker, personally; both are free) and see what it says. Guessing about performance bottlenecks is risky business. Even the best assembly hackers can't always look at a program and tell you where its performance issues will be. So the conventional advice is, if you're talking about performance, you should have some profiler numbers to prove out your assumptions about what's actually slow. In this case, the float/int/float conversion paths are probably hurting you the most, but you might discover other interesting things about the code if you throw a good profiler at it.

Good luck!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

trs79

126

Author

October 13, 2011 10:05 PM

Thanks for the info and profiler links, I'll give those a shot. I can't believe it didn't don on me that the compiler was optimizing away the loops, thanks everyone for showing me that!

performance hit when accessing array in large loop

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

performance hit when accessing array in large loop

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines