Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


float unlimited increasing rotation or use a if


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
53 replies to this topic

#21 BGB   Crossbones+   -  Reputation: 1554

Like
1Likes
Like

Posted 30 December 2013 - 12:21 PM

 


Can a i7 be faster with sin() instead of a lookuptable? , and maybe a Celeron ( which is my current game development pc with onboard graphics ) cant ?

 

Lookup tables are so 1990's. Think of the cache. Processors have become lightning fast since then while ram speed has not.

Also the line "i,m telling you : games are not playable with functions like sin() and cos() and sqrtf() ( i still need to get some fast sqrtf function by the way )." had me a retro-chuckling.

 

 

 

actually, IME, lookup tables *can* be pretty fast, provided they are all kept small enough to mostly fit in the L1 or (at least) L2 cache.

 

for example, a 256-entry table of 16-bit items: probably pretty fast.

OTOH, a 16k/32k/64k entry table of 32 or 64 bit items... errm... not so fast.

 

 

as for sin/cos/sqrt/...

probably not worth worrying about, unless there is good reason.

 

the performance issues with these, however, are not so much with the CPU as with how certain compilers handle the C library math functions.

but, in most cases, this should not matter (yes, including in the game logic and renderer).

 

I would not personally recommend sin or cos tables as an attempt at a "general purpose" solution, as this is unlikely to gain much (and if done naively will most likely be slower, more so if int<->float conversions and similar are involved).

 

 

for special-purpose use cases, they can make sense, but generally in the same sort of contexts where one will not typically be using floats either.



Sponsor:

#22 Pink Horror   Members   -  Reputation: 1228

Like
4Likes
Like

Posted 30 December 2013 - 01:35 PM

Hello Matias, thanks for the reply.

I was looking for some info specific about what is called branching, it is still not clear to me :

 

if i use only the "if", and not the brackets after, is it still branching ?

and what if i only use the brackets like this, without the if :

 

{

// code here

}

 

does that also count as branching ?

 

greetings

 

If you have to ask questions like this, you're not really ready to do any low-level optimizations.

 

Also, going branchless isn't always a win. I've worked on optimization for some platforms where I actually got speed improvements by changing from heavily-optimized branchless floating-point math into the most basic, beginner-friendly if/else code possible. The previous optimizations had turned out to be very platform specific, and on some slower, simpler processors, branching wasn't relatively as bad as caching the extra instructions and performing redundant math.

 

Of course I only even tried this because the code I modified had showed up in a profile as something I should look at. Now, there's usually some platform-specific thing you can do to speed up your math, but I always prefer to start from the simplest possible reference implementation, and that implementation should be kept around as a compile option. You can also use a reference implementation to test whatever faster math you create.



#23 jbadams   Senior Staff   -  Reputation: 19324

Like
4Likes
Like

Posted 30 December 2013 - 03:56 PM

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

Meaningless benchmarks will get you meaningless results.

You can't just test if statements vs. function calls and then apply those results everywhere in your code; you need to test each particular if statement against it's equivalent function, as sometimes one will be better, but in other cases that won't be true.

You also need to do your tests in release mode with optimization enabled, in which case the compiler may inline your function call or even leave code out entirely if it detects that it isn't needed or used. You need to test real code samples, not artificial things like functions vs. if.

1,000 items on screen isn't a big number, you should stop touting it like you have some crazy unusual performance needs.

VS Express 2005 is almost 10 years old, it's probably time to update. That being said, it's still smart enough to optimize many of the situations being discussed.

(Posted from mobile.)

#24 the incredible smoker   Members   -  Reputation: 403

Like
1Likes
Like

Posted 31 December 2013 - 06:06 AM

I have this software Original complete package, so i have to use this.

I dont think i can use the newest version with my keycode.

 

My lookuptables are usually 16-bit 512 or max 1024 sometimes, i dont know if this is a issue.

And i will do for every function a test, not test just 1 function and say its faster or slower, ofcourse.

btw : I dont aim for i7 PCs, i like my game playable for everyone, also those without the best system,

i still like old games to, if i reach to something like a Dreamcast game i will be happy enough,

i bet there are enough people without a expensive game pc.

 

+ this topic costs me lots of points,  time to play screenshot showdown before reaching zero ( will i be banned then lol ? ).

Anyways : Happy newyear all!


S T O P   C R I M E !

Visual Pro 2005 C++ DX9 Cubase VST 3.70  Working on : LevelContainer class & LevelEditor


#25 Vortez   Crossbones+   -  Reputation: 2704

Like
0Likes
Like

Posted 31 December 2013 - 09:32 AM

The reason you got downvoted is because you worrie too much about meaningless micro-optimization. Those kind of optimization might had their use in the 80's, maybe even 90's, to a much lesser extend, but are all but useless nowaday. Your game wont run slower because you choose to use an if or a math function, i can garranty you.

 


I have learned programming not on school, i also dont know how to use a debugger.

 

Using a debugger is not hard, and as i always says, it's the programmer's best friend. I couldn't do much without a debugger to be honest, all i would do it guess what's wrong, until i ragequit and punch my computer smile.png. Seriously tho, this is really something you should learn to use, fast.


Edited by Vortez, 31 December 2013 - 09:38 AM.


#26 jHaskell   Members   -  Reputation: 1086

Like
1Likes
Like

Posted 31 December 2013 - 12:44 PM


And i will do for every function a test, not test just 1 function and say its faster or slower, ofcourse.

 

Testing functions is, to put it bluntly, pointless.  The only performance testing you should be doing is on the entire application.  Develop your 1000 objects running around on the screen and benchmark that.  If it isn't running fast enough, profile the code to see where it's spending most of it's time.  This will tell you what areas of code your application spends most of it's time in, which tells you what areas of code you should focus on optimizing to get REAL performance increases.

 

If an application spends 1% of it's time in a particular function, and you rewrite that function to execute 50% faster, you've gained no real increase in performance.  If the application spends 50% of it's time in a particular function and you rewrite it to execute 50% faster (not a very realistic scenario), you've gained a significant increase in performance.

 

Seriously, it's entirely likely you're spending most of your time worrying about "optimizations" that are completely irrelevant on modern computers, even low end ones.  The only way to really know what optimizations are truly worthwhile for a given application is by profiling.  Without that information, you're largely shooting in the dark.



#27 Matias Goldberg   Crossbones+   -  Reputation: 3695

Like
2Likes
Like

Posted 01 January 2014 - 12:25 AM

I suggest that if you're THAT interesting in LEARNING optimization, start coding in assembly.
 
I'm talking about implementing matrix 4x4 concatenation using assembly, a transform & lighting pipeline in assembly, a DCT (Discrete Cosine Transform).
You may or may not write something faster than a well written C code, but the learning experience is rich.
 
For example, when I implemented my own matrix 4x4 concatenation in assembly, I learned about subtleties of the architecture of the code and language. My function was failing when I was doing concat4x4( &result, &matrixA, &matrixA ); but worked correct when doing concat4x4( &result, &matrixA, &matrixB );
 
After lots of debugging, I realized my math was correct, but my code assumed that matrixA & matrixB weren't pointing to the same memory and I was overwriting the values as I wrote them. I had to clone the matrix before I made the concat; unless I could guarantee matrixA & matrixB arguments wouldn't overlap. That was far more valuable than any micro-optimization and stuff like this can really hurt your application's performance. I was learning, by myself (and by accident), the concept of the restrict keyword
 
Like I said, modern architectures are very complex. And your "lab" experiments of benchmarking code without context is useless (because neither branch predictors nor caches are stateless).
To picture how complex modern CPUs are, here's an example of an Intel i7 anomaly, where adding a call instruction actually made the program run faster.
One would think, adding an instruction should make the program slower. Most likely the reason behind this are loads blocked due to store forwarding. Considering you didn't know what a branch was, I won't explain what load-store forwarding is, as it is extremely low level (and I'm actually guessing, the reason behind the anomaly could be something else).
   

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

My tests also show me that the sun probably moves around the earth, that doesn't mean I'm right.
  

Is that a problem ?, i thought questions are never dumb, i skip learning everything that is not needed to get result, if i need something i can Always ask it.


We value smart, well asked questions. But that attitude won't get you anywhere. We value MUCH more people who can solve problems by themselves and ask questions to others as a last resort, after you've exhausted all your other alternatives (trial and error, books, manuals, papers, other people's code i.e. open source, google and wikipedia).
  

But if you defending your own business, ofcourse you dont wanto tell the competition how to get your games optimized,

This industry wouldn't be anywhere if people hadn't share their experiences and explain to others what they did in detail. Many companies and individuals share their latest next-gen techniques on GDC and SIGGRAPH for everyone to reproduce (that includes big names such as CryTek, Unreal, Ubisoft, Naughty Dog, Eidos, Square Enix, Valve, Microsoft, Sony, AMD, NVIDIA, Intel) and that helped moved the industry forward.
You're thinking it backwards.


Edited by Matias Goldberg, 01 January 2014 - 12:27 AM.


#28 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 01 January 2014 - 10:49 AM

I suggest that if you're THAT interesting in LEARNING optimization, start coding in assembly.
 
I'm talking about implementing matrix 4x4 concatenation using assembly, a transform & lighting pipeline in assembly, a DCT (Discrete Cosine Transform).
You may or may not write something faster than a well written C code, but the learning experience is rich.

 

interestingly, there is a fast way to implement DCT, a slow way to implement DCT, and a dead slow way to implement DCT.

I remember once reading a paper which was talking about ways to efficiently implement sin and cos, then one of the cases they gave for needing a fast cos operator was for "high speed DCT transforms used in video compression...".

seeing this line was a "WTF?! FFS!! LOLZ!" moment.

basically, this statement itself made it pretty obvious that they had probably never written a video codec.

(ADD: basically, they tend to sidestep the use of cosines altogether).

 

nevermind the rest of the paper was just an overly long way of saying "use a lookup table".

 

wasn't particularly all that impressed...


Edited by BGB, 02 January 2014 - 12:58 AM.


#29 Washu   Senior Moderators   -  Reputation: 5416

Like
3Likes
Like

Posted 01 January 2014 - 01:58 PM

 

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

Meaningless benchmarks will get you meaningless results.

You can't just test if statements vs. function calls and then apply those results everywhere in your code; you need to test each particular if statement against it's equivalent function, as sometimes one will be better, but in other cases that won't be true.

You also need to do your tests in release mode with optimization enabled, in which case the compiler may inline your function call or even leave code out entirely if it detects that it isn't needed or used. You need to test real code samples, not artificial things like functions vs. if.

1,000 items on screen isn't a big number, you should stop touting it like you have some crazy unusual performance needs.

VS Express 2005 is almost 10 years old, it's probably time to update. That being said, it's still smart enough to optimize many of the situations being discussed.

(Posted from mobile.)

 

It is also important to test your benchmarks in a real world application...

 

Benchmarking a loop, function call, or anything similar in isolation gives you meaningless results. It doesn't tell you which is ACTUALLY faster. Just which happens to be faster in a random timing segment.

 

Furthermore, as far as optimization goes: It is better to optimize things at a high level over a low level the vast majority of the time. That is, you will see more significant performance gains through the change of a data structure or algorithm than you will with micro optimizations most of the time.

 

As a final note, doing less work often does not mean doing work FASTER. There are many, many cases where doing more work is often faster than a similar case except doing less work. This is especially true on the PS/XBox platforms. A good example of this is Battlefield 4/Frostbite culling changes from BF3. Prior to frostbite they were using a hierarchical culling system, which does less work (i.e. touches fewer objects), however by switching to a brute force culling system, that checks visibility for all objects they observed a significant speed boost. Or, in other words, an algorithmic change resulted in a roughly 3 fold increase in performance. By then applying some basic data restructuring (to localize the information necessary for culling better) along with reworking some basic math operations to eliminate branches and apply SIMD they were able to get another 35% increase in performance out of it.

 

Now, think of those numbers, they went from roughly 3.9ms to 1.14ms for culling, and then they reduced that by another 35% (roughly). Which gain was greater? The answer is quite obvious when you do the math (take the difference between the values) going from 3.9ms to 1.14ms is a gain of 2.76ms, while going from 1.14ms to .74ms is only .4ms. So clearly the algorithmic change won. Does that mean they shouldn't have applied SIMD and the complexity of re-ordering their data? No, not at all. They determined after profiling the initial results that they could do better and that extra half a millisecond is a significant amount of time. But it is quite clear which of the changes resulted in the greatest performance gain, and the micro-optimization of using SOA and SIMD was not it.


In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX


#30 the incredible smoker   Members   -  Reputation: 403

Like
1Likes
Like

Posted 02 January 2014 - 06:01 AM

Why would the test be not correct ?

 

I test a few times, so its not that random, just a bit random, not every time exacts the same, but close enough to see its faster.


Edited by the incredible smoker, 02 January 2014 - 06:03 AM.

S T O P   C R I M E !

Visual Pro 2005 C++ DX9 Cubase VST 3.70  Working on : LevelContainer class & LevelEditor


#31 fir   Members   -  Reputation: -456

Like
-2Likes
Like

Posted 02 January 2014 - 06:53 AM

 

+ this topic costs me lots of points,  time to play screenshot showdown before reaching zero ( will i be banned then lol ? ).

Anyways : Happy newyear all!

 

this is not about you but more about voters mindset condition - i will upvote you insteed because i do like consider such optymizations too 


Edited by fir, 02 January 2014 - 08:27 AM.


#32 SimonForsman   Crossbones+   -  Reputation: 6293

Like
2Likes
Like

Posted 02 January 2014 - 08:41 AM

Why would the test be not correct ?

 

I test a few times, so its not that random, just a bit random, not every time exacts the same, but close enough to see its faster.

 

Because of things like the cache, using for example a lookup table is alot faster if you only test accessing the lookup table(in a loop for example) since it will most likely be cached after the first access, if you instead test real production code that does more varied tasks the lookup table can get pushed out of the cache frequently and performance will drop dramatically, you almost never get useful results by testing a single function in isolation these days.


I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#33 the incredible smoker   Members   -  Reputation: 403

Like
1Likes
Like

Posted 02 January 2014 - 09:39 AM

Ok Simon, how about this one :

 

x = min( x , 100 );

 

or

 

if( x > 100 )x = 100;

 

And thank you Fir.

greetings


Edited by the incredible smoker, 02 January 2014 - 09:39 AM.

S T O P   C R I M E !

Visual Pro 2005 C++ DX9 Cubase VST 3.70  Working on : LevelContainer class & LevelEditor


#34 Álvaro   Crossbones+   -  Reputation: 13888

Like
1Likes
Like

Posted 02 January 2014 - 10:03 AM



Ok Simon, how about this one :

 

x = min( x , 100 );

 

or

 

if( x > 100 )x = 100;

 

And thank you Fir.

greetings

 

What about that one? What's the question?

 

Code clarity is almost the only thing that matters here, so use the one that most closely represents your intent.

 

For instance, if x is something that is computed in some way but we want to cap it from above for some reason, I would write

int capped_x = min(x, 100);

 

This is because it's easier to reason about the code if the meaning of a variable doesn't change through its lifetime, and if it has a descriptive name.

 

Do yourself a favor and stop thinking of micro-optimizations that won't make a difference.



#35 the incredible smoker   Members   -  Reputation: 403

Like
0Likes
Like

Posted 02 January 2014 - 10:13 AM

Look Alvaro : if i,m saying i have 1000 stuff inscreen they tell me not to brag.

 

I,m sorry : its about wheter the if is faster then the min(), not about how fast you can understand the code,

i would comment out the Original code above it, so dont worry please,

its just an example, not that i have it from my programming pc.

 

And about assembly programming : no thanks, i planned to not start with that, exept for the float to int code that i copyd, lol,

not that i dont need it, i would like to understand it, but i rather make a finished game.


Edited by the incredible smoker, 02 January 2014 - 10:24 AM.

S T O P   C R I M E !

Visual Pro 2005 C++ DX9 Cubase VST 3.70  Working on : LevelContainer class & LevelEditor


#36 Madhed   Crossbones+   -  Reputation: 3129

Like
0Likes
Like

Posted 02 January 2014 - 10:18 AM

Without seeing your code I honestly believe you can gain much more performance by doing high level optimization. What kind of algorithms are you using within your main loop?



#37 SimonForsman   Crossbones+   -  Reputation: 6293

Like
0Likes
Like

Posted 02 January 2014 - 10:23 AM

Ok Simon, how about this one :

 

x = min( x , 100 );

 

or

 

if( x > 100 )x = 100;

 

And thank you Fir.

greetings

 

a good compiler will most likely inline functions such as min and should be able to insert constants for you in inlined code so your examples result in pretty much the same code, the min function might even be faster once inlined since it can be implemented without branches, (and many CPUs are far better at basic arithmetics than they are at dealing with branches)

 

i.e the function: int min(int a, int b) { return a*(a<b) + b*(a>=b); } if called as x = min(x,100) could probably be inlined by the compiler to x = x * (x<100) + 100*(x>=100); which then outperforms your optimization attempt on any architecture that handles arithmetics better than branching while still being free to use a branching implementation if the exact same code is compiled for an architecture that handles branching well (compared to the extra arithmetics)

 

There is very little to gain and very much to lose by blindly trying to optimize things by manually inlining them.

if your standard library has functions that perform badly on your target platform, provide your own functions but leave the inlining to the compiler (that way you can still replace the implementation easily if you have to port the code to a different architecture)

 

also, make sure you have all optimizations enabled in your build settings before you try to measure performance. (if you try to measure things with a debug build you will get very misleading results)


Edited by SimonForsman, 02 January 2014 - 10:37 AM.

I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#38 the incredible smoker   Members   -  Reputation: 403

Like
1Likes
Like

Posted 02 January 2014 - 11:21 AM

Ok Simon, thanks for the explaining.

I will make a test soon, compiled in Release mode, and see what happens on both Celeron and i7 pc.

 

@ Madhed : What is meaned with high level optimalisation ?

something that is repeated 1000 times i am worrying about, if that is called low level optimalisation, then i am very confused.

 

I dont send you my engine but i have it like this, ( is this high level optimalisation ? )  :

every gamestate has its own class, and within the demomovie & game gamestate theres function pointers for each level.

 

greetings


S T O P   C R I M E !

Visual Pro 2005 C++ DX9 Cubase VST 3.70  Working on : LevelContainer class & LevelEditor


#39 fir   Members   -  Reputation: -456

Like
1Likes
Like

Posted 02 January 2014 - 11:39 AM

Ok Simon, thanks for the explaining.

I will make a test soon, compiled in Release mode, and see what happens on both Celeron and i7 pc.

 

@ Madhed : What is meaned with high level optimalisation ?

something that is repeated 1000 times i am worrying about, if that is called low level optimalisation, then i am very confused.

 

I dont send you my engine but i have it like this, ( is this high level optimalisation ? )  :

every gamestate has its own class, and within the demomovie & game gamestate theres function pointers for each level.

 

greetings

 

you should only worried when it gets called 1M or 1G times not 1000 ;/



#40 Madhed   Crossbones+   -  Reputation: 3129

Like
2Likes
Like

Posted 02 January 2014 - 11:45 AM


@ Madhed : What is meaned with high level optimalisation ?

 

Low level: Optimizing small self contained functions like sin() cos() min() max() that are repeated 1000s of times.

High level: Finding better algorithms so you don't even have to call those things 1000s of times.

 

I don't know the code, so I'm just guessing.

 

If you find out that you spend 50% of the time in a function that is called millions of times your first idea might be to make the function faster, while it might be even better and easier to make sure the function isn't called that often in the first place.

 

Can you try http://www.codersnotes.com/sleepy (Very sleepy) ? It's a simple stochastic profiler that you can use to measure your program performance. It shows you were your program is spending most of its time.

 

If you like you can post screenshots of the results, so maybe we can discuss what's going on there.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS