Jump to content

  • Log In with Google      Sign In   
  • Create Account

Very strange code slowdown


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 11:17 AM

I have a very slow piece of code I'm working on. Well, to be more accurate, it does a lot of calculation and has been quite fast at it.

Anyway, it was running at ~0.4 seconds per loop in the test.

I switched to a .Select(Array, Index).Parallel.ForAll() to see if that would speed things up.

First results:

~0.22 seconds per loop.

Later results:

~0.33 seconds per loop.

I switch back to the old code, including a mass undo just to make sure it's exactly the same:

~0.69 seconds per loop.

And this wouldn't be the first time the C# compiler has dumped a sudden, inexplicable slowdown on me with this code. Just going to sleep and running it again in the morning resulted in a slowdown.

So, why would adding Linq code to it result in such a slowdown? Especially after the Linq code has been removed? How consistant is the compiler about speed optimizations?

Thanks.

Sponsor:

#2 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 12:08 PM

In the stack trace, would this mean that the array is getting copied instead of passed as a reference?:
>    PerlinNoise.dll!PerlinNoise.ModulatedPerlinNoise.Generate(ref float[][] fillArray = {float[1024][]}) Line 94    C#
(Trying Perlin again as an exercise, now that I know more)

#3 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 04 April 2012 - 12:54 PM

Try a profiler.

#4 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 01:07 PM

Try a profiler.

ANTS just finished installing. Can't afford it right now, but I can at least use the free trial for a couple weeks.

#5 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 02:18 PM

1.957% here:
for (x = 0; x < arrayWidth; ++x
It's probably that I need to set arrayWidth explicitly to the width of the inner array, rather than assume it'll be the same - C# can do some optimizations if it knows that the index variable will not exceed the array, apparently.
2.793% is spent on each instance of this instruction, which exists with different +/-'s in four places:
int n = (x - 1) + (y + 1) * 57;
Both x and y are integer.
And the same amount of time here:
float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;
Which occurs once.
4.469% each for these:
float xFade = fractional_X * fractional_X * fractional_X * (fractional_X * ((fractional_X * 6F) - 15F) + 10F);
float yFade = fractional_Y * fractional_Y * fractional_Y * (fractional_Y * ((fractional_Y * 6F) - 15F) + 10F);
And again, 2.793% each for these:
float i1 = (v1 * (1F - fractional_X)) + (v2 * xFade);
float i2 = (v3 * (1F - fractional_X)) + (v4 * xFade);
float total = (i1 * (1F - fractional_Y)) + (i2 * yFade);
Which are all floats. Aaand...I should really make them all fade values, not the linear fractional.
3.352% here:
array[x] += (total * amplitude) * oneOvertotalAmplitude;


#6 ApochPiQ   Moderators   -  Reputation: 15741

Like
1Likes
Like

Posted 04 April 2012 - 03:40 PM

How are you doing the timing? Some mechanisms for counting elapsed time are not precise enough to measure this kind of thing.

There's also a lot of complication in the way .NET executes; JIT compilation and other factors might affect things. There's also any number of unrelated factors on your machine that can affect benchmarking.


Fun experiment: try running a time-sensitive computation with a duration of many seconds. Then do the same thing with a high-res YouTube video playing in the background. Voila! Instant time warp.

#7 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 03:59 PM

How are you doing the timing? Some mechanisms for counting elapsed time are not precise enough to measure this kind of thing.

There's also a lot of complication in the way .NET executes; JIT compilation and other factors might affect things. There's also any number of unrelated factors on your machine that can affect benchmarking.


Fun experiment: try running a time-sensitive computation with a duration of many seconds. Then do the same thing with a high-res YouTube video playing in the background. Voila! Instant time warp.

The ANTS performance profiler trial I just installed.

I was thinking that might account for the slowdown - Task Manager has been shoing various things being busy. Just wasn't sure how much it would affect, since I have a dual-core.

Still, I would like to know why those instructions in particular are taking up 33.798% of the program in that profile - If there's anything aside from "They happen a lot". (1024x1024x5).

#8 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 04:47 PM

Consistently confirmed: After compiling the program, explorer.exe hits 50% CPU and stays there for about a minute.

#9 Adam_42   Crossbones+   -  Reputation: 2507

Like
0Likes
Like

Posted 04 April 2012 - 06:53 PM

It's hard to know what is causing performance issues without seeing more code. Here's a few educated guesses:

- One thing that can make a big difference to performance is the location of the data you're dealing with in memory. A contiguous array of objects is usually much quicker than an array of pointers to objects. I think in C# to do that you need the items in the array to be structs instead of classes. This type of issue can also make performance vary randomly depending on how lucky you are with where the allocator puts the data.

- Floating point maths can have a few hidden performance issues. Firstly if the data ends up with denormalized / NaN / Inf values the CPU will process them much slower than other values (IIRC over 10x slower in some cases). Secondly as reordering floating point operations affects the result the compiler will normally avoid it. As an example try these alternate lines:

float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;
float corners = ((corner1 + corner2) + (corner3 + corner4)) * 0.25F;

The results should be very similar, but performance may not be. The second version reduces the dependency chain by one.

- Also note that in C# running the program via the debugger will normally disable all optimizations. You need to test a release build outside of the debugger.

#10 Nypyren   Crossbones+   -  Reputation: 4306

Like
0Likes
Like

Posted 04 April 2012 - 06:53 PM

Consistently confirmed: After compiling the program, explorer.exe hits 50% CPU and stays there for about a minute.


Just compiling and not actually running? Do you have a virus scanner that's being a bit hyper or something?

#11 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 04 April 2012 - 08:53 PM

It's hard to know what is causing performance issues without seeing more code. Here's a few educated guesses:

- One thing that can make a big difference to performance is the location of the data you're dealing with in memory. A contiguous array of objects is usually much quicker than an array of pointers to objects. I think in C# to do that you need the items in the array to be structs instead of classes. This type of issue can also make performance vary randomly depending on how lucky you are with where the allocator puts the data.

- Floating point maths can have a few hidden performance issues. Firstly if the data ends up with denormalized / NaN / Inf values the CPU will process them much slower than other values (IIRC over 10x slower in some cases). Secondly as reordering floating point operations affects the result the compiler will normally avoid it. As an example try these alternate lines:

float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;
float corners = ((corner1 + corner2) + (corner3 + corner4)) * 0.25F;

The results should be very similar, but performance may not be. The second version reduces the dependency chain by one.

- Also note that in C# running the program via the debugger will normally disable all optimizations. You need to test a release build outside of the debugger.

Need to fix up the code before I post it. I've been focusing on getting it as fast as possible as a coding exercise.

1) And I'm using an array of pointers to an array. Thanks; worth testing. My naive "multi-dimensional" array speed tests may well not have shown that - (x + (y * width)) indexed single array, "single initialization" multidimensional array [ , ] and array of arrays.

2) It's an array of floats, so the allocation should be good, aside from the "array of pointers" thing.

3a) *Looks up denormalized float values* Hmm...Any tips on preventing that?
3b) Thanks, will try that. And there's several other places I could put brackets.

4) Learned that a while back, thanks. :)


Consistently confirmed: After compiling the program, explorer.exe hits 50% CPU and stays there for about a minute.


Just compiling and not actually running? Do you have a virus scanner that's being a bit hyper or something?

Yeah and it has shown up, but explorer.exe is consistant, not the virus-scan process.

#12 Adam_42   Crossbones+   -  Reputation: 2507

Like
0Likes
Like

Posted 05 April 2012 - 06:46 AM

For 3a the simple answer is to tweak the settings of the FPU: http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/ which you'll need P/Invoke to do.

Unfortunately that's only really practical if you're using SSE instructions, and I'm not sure what C# uses in x86 mode (it is SSE in x64).

Sometimes it's also possible to adjust the algorithm you're using to avoid them.

#13 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 05 April 2012 - 05:08 PM

For 3a the simple answer is to tweak the settings of the FPU: http://software.inte...s-are-zero-daz/ which you'll need P/Invoke to do.

Unfortunately that's only really practical if you're using SSE instructions, and I'm not sure what C# uses in x86 mode (it is SSE in x64).

Sometimes it's also possible to adjust the algorithm you're using to avoid them.

Gah! Internet! How many times must I write this post???

Anyway, C# can PInvoke, so I can tweak that. Just need to know which .dll to call? Thanks.

I also changed it from a [][] array to a [] array indexed like a [][] aray. It went from 3.5s-4.0s to 3.2s-3.5 seconds. :)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS