Jump to content
  • Advertisement
Sign in to follow this  
TerrorFLOP

Optimization Puzzle

This topic is 4907 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Thx jjd...

You know, I actually thought of that...

I didn't run the tests though 'cos I thought it sounded a bit crazy (LOL.... No offence mate!)

Mmmm... Perhaps now that you have mentioned I'll give that idea a go...

However, I've a funny feeling that the ever so helpful complier will say;

'Great!!! Another code invariant that does not depend on the looping index'

and move the code according...

Sure, helpful in MOST situations, but not in code timing...

Last resort I guess is to somehow incorporate the looping index in the test code...

I might not get an accurate test... But at least it won't move the code so that it CAN be tested...

Gonna try some ideas now...

Thx!

Share this post


Link to post
Share on other sites
Advertisement
TerrorFlop:
One thing you could try is to take the assembly the compiler generates (to call normal sqrt) and put that in an __asm block so that the compiler will treat it as the same "black box" and not optimize it. You could also try __declspec(noinline) in case the compiler is just unrolling and inlining the whole deal...

However, the BEST (and I highly recommend this if you're into this sort of thing) is to go to AMD's website and download CodeAnalyst. It allows you to 'record' portions of your code, and then it simulates everything that is happening on the processor during that time and shows you a graph of where all the cycles are going and which stages of the pipeline are stalling for each instruction. It takes a lot of practice to really understand everything that's happening, but the "Green is Good, Red is Bad" thing will be a good start :)

Anyway, regarding the "Compiler is the God of Optimization"... That would be great, wouldn't it? It makes sense, right? Unfortunately, the compiler is *terrible* at optimizing, and especially intrinsics. I mean jokingly bad. The only thing the compiler actually does is inline and unroll loops, so as to distort the sort of tests TerrorFlop is trying to run. The real genius is the processor, especially those AMDs. The re-arrangement of crappy code that that thing can do on the fly, it's indescribable what sort of genius went into producing that on silicon at 3 GHz.

Anyway back to TF: The test you're running... it's not a very good way to test SSE. The one thing the processor can't do is magically eliminate latency stalls, like using xmm0 as a destination and source to almost every instruction... it'll fill those stalls with out-of-order instructions, but if all you're doing is a tight loop that keeps stalling, there's not much it can hope to do. Do you need to use doubles? This can be written much faster with floats.

[edit] Another thing. I'm not sure about doubles, but for floats there are 2 instructions: sqrtps and rsqrtps. One is the full IEEE float sqrt, and the second is a 12-bit accurate inverse square root that runs about 7 times faster (so to eliminate the divide that is necessary to do a normalize). So, it's likely that the "meat" of your function, the square root, is occupying 80% of your cycles in both functions. See, this is the sort of thing CodeAnalyst is for.[/edit]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Just include the loop index in the arguments to the function. No more invariant.

Share this post


Link to post
Share on other sites
Thx Ajas.

You've certainly given me food for thought there and I'll check out CodeAnalyst.

Think I'll just give up anyway... (the other coding ideas I wrote still come up with ‘sqrt’ as fast as a bleeding NULL statement!!!)

You would have thought that accurately timing code would be pretty easy, but the complier can and does get in the way.

However testing 'sqrt's actual asm code directly sounds like a damn good idea...

The SSE2 code I wrote... And lets face it... It ain't complex (but yeah, I know a little bit about Intel’s CPU instruction re-ordering... Damn complex subject!!!)... IS fast...

I was simply bemused that 'sqrt' is 'faster' despite it having such a bad rep in most optimization books I've read.

And the fact that it’s as fast (apparently) as a NULL function sure as hell doesn't make sense.

But thx man... Gonna go rip out 'sqrt's innards and learn from its dark secrets LOL!

Share this post


Link to post
Share on other sites
Anon said:

Just include the loop index in the arguments to the function. No more invariant.



Thx for the tip.... But even that does not appear to work. The complier still performs its magic... Somehow... :-(

(Which is damn useful 99.999% of the time... But not for accurately testing code)

__________________________________________________________
Optimizing Code Really F's U Up!!! DON'T DO IT!!! (LOL)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
It could be that it's detecting that you're not storing the result anywhere and it knows the sqrt function doesn't do anything so it's just optimizing out the whole thing? Maybe try storing the results in a preallocated array. Then the time it takes for the memory writes becomes part of the test...but you can view it as added realism. Who knows, it could be that under realistic usage one method is faster than the other sometimes and sometimes the other is faster, due to inlining + out of order execution + memory usage patterns.

Share this post


Link to post
Share on other sites
An extract from MSDN Online:

_______________________________________________________________________________
Often, some code inside a loop, which is repeated many times, does not change values during the loop's execution. These loop invariants can be calculated once before the loop runs. Removing or hoisting loop invariants generally improves speed without affecting size very much. For example, invariant hoisting can remove the arithmetic from the inside of this loop:

for (i=0;i<1000;i++) {
A = a + b; }

The product is

t = a+b;
for (i=0;i<1000;i++) {
A = t; }

This saves 999 addition operations at the cost of one temporary variable, which is likely to go into a register.
________________________________________________________________________________


So even trying to force code to be NON-invariant is pointless... The complier does what the hell it wants...

Most times thats helpful.... Most times...

Programmer 0 - Complier 1 :-(

Share this post


Link to post
Share on other sites
Quote:
Original post by TerrorFLOP
Thx Sherlock! (j/k)

Ah... Win32 API / ANSI function calls... All the same to me really when you've programmed for a while...

for a while but not for long enough.

It is standard library call, heck, not even a call (i can bet it is inlined). In release mode with optimizations on is going to have equal or greater speed than whatever-you-can-ever-write.

anyway, if you need this sqrt to compute distance, chaces are you can do your work with squared radius and do not need sqrt at all.

p.s. benchmarks is worth precisely nothing when compiler is "more smart" than writer of benchmark, and it is certanly the case. Compiler will not compile things that is not necessary, that is. You need to do something useful with results(like, add them and print the sum), make all inputs different, etc. It's not just invariant, if result is going nowhere, no computation is done. Better yet benchmark your actual code. Chances are, compiler will optimize a lot of unnecessary code out there too.

Anyway, just forget about assembly-level optimizations. If you can not write benchmark, you surely can not make optimizations better than compiler can, as compiler cleanly "knows" more optimizations than you.

Share this post


Link to post
Share on other sites
1. Putting hand-coded asm into your code makes it hard for the compiler to optimise the code around it, so unless it's a substantial chunk of asm any speed improvement may well be lost.

2. You might be better off using intrinsics in the source code (not asm) - e.g. here.

3. Your compiler probably supports generating SSE instructions anyway (even if it doesn't manage the vector forms) - why not just enable this flag in the compiler? Of course, you need to know that the target machine will support them.

4. If you really care about speed, use floats not doubles, which may require reworking your algorithms to handle the loss in precision.

5. doing sqrt(x*x + y*y) can generate precision problems when x and y are large - that's the point of the hypot function. It may be that the increased accuracy with hypot means you can use floats, not doubles.

I don't know whether the above suggestions will help - my own experience has been that trying to speed things up at this low level (SSE intrinsics, prefetching etc) invariably slows things down!

Share this post


Link to post
Share on other sites
Dmytry...

I find your lack of IQ... Disturbing to say the least.

No wonder there has been posts from the moderators about peeps like you.

If I wish to time my code... Then that is entirely my business.

I've timed code before... And I shall do so... AGAIN.

There’s even code in the online MSDN (and several other books) which shows you how to implement code timing logic and I've implemented some of that code into my own timing routines. If you have a problem with other coders trying to time and experiment with their code... Then you have a serious problem dude.

With regards to the 'radius squared' issue... You have NO idea why I would choose to norm vectors do you?

Go check out Newton's Law Of Gravitation... The VECTOR version (since force IS a vector)... And you'll realize that the famous law utilizes norm cubes.

Anyway, WHY I'm I trying to explain myself to you.

If you've nothing constructive to say man... THEN DON'T SAY!!!

I'm sure the rest of the civilized world will get by just fine without your 'insights' (if indeed you can call it that).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!