#1 Members - Reputation: 305
Posted 04 April 2012 - 11:17 AM
Anyway, it was running at ~0.4 seconds per loop in the test.
I switched to a .Select(Array, Index).Parallel.ForAll() to see if that would speed things up.
First results:
~0.22 seconds per loop.
Later results:
~0.33 seconds per loop.
I switch back to the old code, including a mass undo just to make sure it's exactly the same:
~0.69 seconds per loop.
And this wouldn't be the first time the C# compiler has dumped a sudden, inexplicable slowdown on me with this code. Just going to sleep and running it again in the morning resulted in a slowdown.
So, why would adding Linq code to it result in such a slowdown? Especially after the Linq code has been removed? How consistant is the compiler about speed optimizations?
Thanks.
#2 Members - Reputation: 305
Posted 04 April 2012 - 12:08 PM
> PerlinNoise.dll!PerlinNoise.ModulatedPerlinNoise.Generate(ref float[][] fillArray = {float[1024][]}) Line 94 C#(Trying Perlin again as an exercise, now that I know more)
#4 Members - Reputation: 305
Posted 04 April 2012 - 01:07 PM
ANTS just finished installing. Can't afford it right now, but I can at least use the free trial for a couple weeks.Try a profiler.
#5 Members - Reputation: 305
Posted 04 April 2012 - 02:18 PM
for (x = 0; x < arrayWidth; ++xIt's probably that I need to set arrayWidth explicitly to the width of the inner array, rather than assume it'll be the same - C# can do some optimizations if it knows that the index variable will not exceed the array, apparently.
2.793% is spent on each instance of this instruction, which exists with different +/-'s in four places:
int n = (x - 1) + (y + 1) * 57;Both x and y are integer.
And the same amount of time here:
float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;Which occurs once.
4.469% each for these:
float xFade = fractional_X * fractional_X * fractional_X * (fractional_X * ((fractional_X * 6F) - 15F) + 10F); float yFade = fractional_Y * fractional_Y * fractional_Y * (fractional_Y * ((fractional_Y * 6F) - 15F) + 10F);And again, 2.793% each for these:
float i1 = (v1 * (1F - fractional_X)) + (v2 * xFade); float i2 = (v3 * (1F - fractional_X)) + (v4 * xFade); float total = (i1 * (1F - fractional_Y)) + (i2 * yFade);Which are all floats. Aaand...I should really make them all fade values, not the linear fractional.
3.352% here:
array[x] += (total * amplitude) * oneOvertotalAmplitude;
#6 Moderators - Reputation: 7557
Posted 04 April 2012 - 03:40 PM
There's also a lot of complication in the way .NET executes; JIT compilation and other factors might affect things. There's also any number of unrelated factors on your machine that can affect benchmarking.
Fun experiment: try running a time-sensitive computation with a duration of many seconds. Then do the same thing with a high-res YouTube video playing in the background. Voila! Instant time warp.
[Work - ArenaNet] [Epoch Language] [Scribblings] [Journal - peek into my shattered mind]
#7 Members - Reputation: 305
Posted 04 April 2012 - 03:59 PM
The ANTS performance profiler trial I just installed.How are you doing the timing? Some mechanisms for counting elapsed time are not precise enough to measure this kind of thing.
There's also a lot of complication in the way .NET executes; JIT compilation and other factors might affect things. There's also any number of unrelated factors on your machine that can affect benchmarking.
Fun experiment: try running a time-sensitive computation with a duration of many seconds. Then do the same thing with a high-res YouTube video playing in the background. Voila! Instant time warp.
I was thinking that might account for the slowdown - Task Manager has been shoing various things being busy. Just wasn't sure how much it would affect, since I have a dual-core.
Still, I would like to know why those instructions in particular are taking up 33.798% of the program in that profile - If there's anything aside from "They happen a lot". (1024x1024x5).
#8 Members - Reputation: 305
Posted 04 April 2012 - 04:47 PM
#9 Members - Reputation: 1411
Posted 04 April 2012 - 06:53 PM
- One thing that can make a big difference to performance is the location of the data you're dealing with in memory. A contiguous array of objects is usually much quicker than an array of pointers to objects. I think in C# to do that you need the items in the array to be structs instead of classes. This type of issue can also make performance vary randomly depending on how lucky you are with where the allocator puts the data.
- Floating point maths can have a few hidden performance issues. Firstly if the data ends up with denormalized / NaN / Inf values the CPU will process them much slower than other values (IIRC over 10x slower in some cases). Secondly as reordering floating point operations affects the result the compiler will normally avoid it. As an example try these alternate lines:
float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;
float corners = ((corner1 + corner2) + (corner3 + corner4)) * 0.25F;
The results should be very similar, but performance may not be. The second version reduces the dependency chain by one.
- Also note that in C# running the program via the debugger will normally disable all optimizations. You need to test a release build outside of the debugger.
#11 Members - Reputation: 305
Posted 04 April 2012 - 08:53 PM
Need to fix up the code before I post it. I've been focusing on getting it as fast as possible as a coding exercise.It's hard to know what is causing performance issues without seeing more code. Here's a few educated guesses:
- One thing that can make a big difference to performance is the location of the data you're dealing with in memory. A contiguous array of objects is usually much quicker than an array of pointers to objects. I think in C# to do that you need the items in the array to be structs instead of classes. This type of issue can also make performance vary randomly depending on how lucky you are with where the allocator puts the data.
- Floating point maths can have a few hidden performance issues. Firstly if the data ends up with denormalized / NaN / Inf values the CPU will process them much slower than other values (IIRC over 10x slower in some cases). Secondly as reordering floating point operations affects the result the compiler will normally avoid it. As an example try these alternate lines:
float corners = (corner1 + corner2 + corner3 + corner4) * 0.25F;
float corners = ((corner1 + corner2) + (corner3 + corner4)) * 0.25F;
The results should be very similar, but performance may not be. The second version reduces the dependency chain by one.
- Also note that in C# running the program via the debugger will normally disable all optimizations. You need to test a release build outside of the debugger.
1) And I'm using an array of pointers to an array. Thanks; worth testing. My naive "multi-dimensional" array speed tests may well not have shown that - (x + (y * width)) indexed single array, "single initialization" multidimensional array [ , ] and array of arrays.
2) It's an array of floats, so the allocation should be good, aside from the "array of pointers" thing.
3a) *Looks up denormalized float values* Hmm...Any tips on preventing that?
3b) Thanks, will try that. And there's several other places I could put brackets.
4) Learned that a while back, thanks.
Yeah and it has shown up, but explorer.exe is consistant, not the virus-scan process.
Consistently confirmed: After compiling the program, explorer.exe hits 50% CPU and stays there for about a minute.
Just compiling and not actually running? Do you have a virus scanner that's being a bit hyper or something?
#12 Members - Reputation: 1411
Posted 05 April 2012 - 06:46 AM
Unfortunately that's only really practical if you're using SSE instructions, and I'm not sure what C# uses in x86 mode (it is SSE in x64).
Sometimes it's also possible to adjust the algorithm you're using to avoid them.
#13 Members - Reputation: 305
Posted 05 April 2012 - 05:08 PM
Gah! Internet! How many times must I write this post???For 3a the simple answer is to tweak the settings of the FPU: http://software.inte...s-are-zero-daz/ which you'll need P/Invoke to do.
Unfortunately that's only really practical if you're using SSE instructions, and I'm not sure what C# uses in x86 mode (it is SSE in x64).
Sometimes it's also possible to adjust the algorithm you're using to avoid them.
Anyway, C# can PInvoke, so I can tweak that. Just need to know which .dll to call? Thanks.
I also changed it from a [][] array to a [] array indexed like a [][] aray. It went from 3.5s-4.0s to 3.2s-3.5 seconds.






