• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
RoyCHill

my concept - the importance of using floating point

18 posts in this topic

A computer game should take advantage of using floating point numbers because the over use of integer numbers may actually overload the CPU for math processing with the FPU potion can be used for math processing. Feedback please. What do you guys think? Let me start on why. A small computer game will never really see a performance hit but all the new mathemathics and physics used now probably is expected to go to a special module to take care of the processing, but maybe those modules are not really needed because the processors already have a portion of that ability inside of their MMX,SSE, and FPU processing.

thanks.
0

Share this post


Link to post
Share on other sites
You are too concerned about trivial things like this. If you are pushing a machine that hard, there are more likely other ways to improve performance. Besides, any sort of gain would be negligible.
0

Share this post


Link to post
Share on other sites
[quote name='GoofProg.F' timestamp='1316015791' post='4861599']
Feedback please.
[/quote]


you should rethink your whole strategy dog


0

Share this post


Link to post
Share on other sites
MMX and SSE registers can be used for integer processing btw, not just floating points. But your primary concern should be about using the right data format for the right job, as opposed to doing premature optimisations.
0

Share this post


Link to post
Share on other sites
[quote name='RobTheBloke' timestamp='1316023826' post='4861672']
A final word of warning: There is so much mis-information and downright lies in internet forums about optimisation techniques, that imho it is best to assume that [b]every[/b] forum post is full of mis-information and lies. The only way to know if something is true or not is to verify it in a profiler.....
[/quote]
Profiled it. You're right, the final method you proposed gave me a 200x performance boost. Thanks!
0

Share this post


Link to post
Share on other sites
This is true in as much as its obviously a perf-gain if you can keep all the various execution resources as busy as possible. One of the reasons that Quake, and later, Unreal, were able to do things that no one else was doing in software rendering was that they kept the Pentium's U and V pipe both pretty busy, along with MMX where available.

Conceptually, this is easy, but in practice it is very hard -- it requires a knowledge of instructions and compilers that most people don't have, and with out-of-order execution, wider super-scalar architectures, operation fusion techniques, and different issue/resolve latencies for each instruction, its basically impossible for a human to accomplish at anything more finely-grained than, say, the major execution units: load/store, integer, FPU (simple and complex), and SIMD.

Yes, highly performant code should be aware of these things, but no, this is no epiphany that you've bestowed upon the world.
2

Share this post


Link to post
Share on other sites
After reading RobTheBloke's post I'm a bit confused. I think he is joking but is Ravyne joking? Is there any truth in what GoofProg.F says? It seems like no one takes him serious.
0

Share this post


Link to post
Share on other sites
After reading Cornstalks reply I have decided to test it and it comes out quite unsurprisingly that the last version of add is two order of magnitude slower than the naive method on my computer (release mode and all optimizations and SSE turned on).
1

Share this post


Link to post
Share on other sites
[quote name='Wooh' timestamp='1316030875' post='4861716']
After reading RobTheBloke's post I'm a bit confused. I think he is joking but is Ravyne joking? Is there any truth in what GoofProg.F says? It seems like no one takes him serious.
[/quote]
Rob's reply was a joke. Ravyne's is not, however. Your program will run faster if it is able to fully utilize the CPU. But simply using floats like Goof suggests is hardly the answer.

[quote name='apatriarca' timestamp='1316034348' post='4861733']
After reading Cornstalks reply I have decided to test it and it comes out quite unsurprisingly that the last version of add is two order of magnitude slower than the naive method on my computer (release mode and all optimizations and SSE turned on).
[/quote]
Hahaha got ya! :)
0

Share this post


Link to post
Share on other sites
[quote name='Wooh' timestamp='1316030875' post='4861716']
After reading RobTheBloke's post I'm a bit confused. I think he is joking but is Ravyne joking? Is there any truth in what GoofProg.F says? It seems like no one takes him serious.
[/quote]

There is a (naive) truth to it, precisely as I laid out in my post. Inside modern CPUs there are several execution units which can operate independently of one another -- If you interleave instructions which target each of those units correctly then more of them will be busy more often, yielding *potentially* higher performance -- I say "potentially" because your problem and your algorithmic approach must be amenable to this type of optimization to have any hope of benefiting from it -- in other words, if your algorithm is Load/Store bound, then there's only so much that re-ordering its integer math can do. However, if the algorithm is rather "balanced" across functional units, then the order of CPU instructions can be of great benefit -- its common to hide memory latency by pre-fetching data a couple instructions ahead of where its needed, for example (though there are tradeoffs to consider even in this simple case, such as register pressure).

Then there's lots of modern processor features which make the benefit unpredictable, if not unnecessary -- Instruction re-ordering can already alleviate many such order-dependent bottlenecks, hyperthreading can make sure more units are busy running another thread when possible, or dual-thread/in-order cores (like on Xbox 360 or PS3) can execute an alternate thread when the other stalls awaiting memory.

Its one of those ideas where the theory is simple and sound, but which takes a great deal of expertise and a measure of luck to exercise favorably.

Now, if tomorrow Intel decides that the way forward is to throw away our fancy, super-scaler, out-of-order 4-core Sandy Bridge CPUs, and in its place they're going to give us 32 4Ghz Atom processors (er, basically Knight's Ferry) or that we all need to migrate to Itanium, then the effect of instruction ordering becomes more predictable and more important. Otherwise, modern CPUs are pretty darn smart as it is, and will do a decent job all on their own, provided you don't go out of your way to sabotage their efforts.
0

Share this post


Link to post
Share on other sites
[quote name='Wooh' timestamp='1316030875' post='4861716']
After reading RobTheBloke's post I'm a bit confused. I think he is joking but is Ravyne joking? Is there any truth in what GoofProg.F says? It seems like no one takes him serious.
[/quote]

There is a partial, vagueish truth to what GoofProg.F says (in so much as interleaving instructions can keep the CPU busier), but he is entirely wrong in the suggested approach to exploit this optimisation. To start with, let's just state an absolute truth from which you cannot escape:

[b]Optimisations that work for one series of CPU (eg core i7) don't always translate to improvements on other CPUs (eg Atom) - and some times they can actually hurt performance! So, it is impossible to optimise code to be perfect on all platforms, the best you can do is make sure it doesn't run BADLY on all architectures[/b]

Most CPU's have seperate execution units which can (if used correctly) execute instructions in parallel (something that require Out-Of-Order-Execution). A simple CPU (such as the Atom) has two (IIRC), whereas some of the newer CPU's have 5 or more. These execution units are not the same however. So some may do integer/floating point addition, some may do integer/floating point multiplication, etc - but they aren't always done by type (eg some may be float only, some may be an int/float mix). Taking something like the Atom, one execution unit can do mult/div/sqrt, and the other can only do addition/subtraction. This makes sense if you consider this code:

[code]
float a[NUM];
float f = 1.0;

for(int i; i<NUM; i++)
{
f *= a[i];
}[/code]

The *= is a floating point mult op. The dereferencing of 'a' is an integer addition operation (since a[i] [/i]can be re-written as *(a+i)). So what you have is an int & a float op next to each other. In theory the CPU [i]could[/i] execute both at the same time - but in this example that won't actually happen. The problem is that there is a dependency between the result of a[i] and the multiplication. However..... there is also another integer op happening, and that is i++. There is a good chance that the CPU may be able to compute i++ and the *= within the same CPU cycle. Win!

At this point you can unroll the loop slightly (and break a couple of dependencies whilst we are at it). i.e.

[code]
float a[NUM];
float f0=1.0f,f1=1.0f,f2=1.0f,f3=1.0f;

for(int i; i<NUM; i+=4)
{
f0 *= a[i ];
f1 *= a[i+1];
f2 *= a[i+2];
f3 *= a[i+3];
}

// if NUM is not a mulitple of 4, don't forget to handle remaining elements here!

float f = f0 * f1 * f2 * f3;
[/code]

At this point you have a healthy mix of different op types (without dependencies between them) that could, in theory, be executed in parallel by the CPU. So... The temptation is to go and write all of your loops like that. Well. Don't. The compiler is there to do that stuff for you, and may decide to insert multiple versions of that code into the exe (e.g. one for Atom, one for core2, one for i7). By manually unrolling the loop you may make the code run slower on some platforms (for example it's possible that a given CPU benefits more from good instruction cache usage, and therefore less code in the loop will be beneficial). If you start optimising code this way and it brings noticeable benefits, bare in mind you've only seen benefits on YOUR CPU, so you should test on other architectures too (because you may have made accidentally things worse!)

Ultimately it's all a bit of a black art really, led mainly by experience, a good knowlegde of the CPU architecture, and a lot of profiler driven guesstimation.....
2

Share this post


Link to post
Share on other sites
[quote name='Ravyne' timestamp='1316029428' post='4861708']
This is true in as much as its obviously a perf-gain if you can keep all the various execution resources as busy as possible. One of the reasons that Quake, and later, Unreal, were able to do things that no one else was doing in software rendering was that they kept the Pentium's U and V pipe both pretty busy, along with MMX where available.

Conceptually, this is easy, but in practice it is very hard -- it requires a knowledge of instructions and compilers that most people don't have, and with out-of-order execution, wider super-scalar architectures, operation fusion techniques, and different issue/resolve latencies for each instruction, its basically impossible for a human to accomplish at anything more finely-grained than, say, the major execution units: load/store, integer, FPU (simple and complex), and SIMD.

Yes, highly performant code should be aware of these things, but no, this is no epiphany that you've bestowed upon the world.
[/quote]

Not sure who marked this post negatively - It's spot on....
0

Share this post


Link to post
Share on other sites
Some of the optimisation hacks look interesting and completely crazy. They make sense, but I'd have to question whether doing something like this worth the effort and the maintainability. Just imagine the poor sod seeing such code for the first time - I wouldn't blame him for thinking the original author lost his marbles.
0

Share this post


Link to post
Share on other sites
[quote name='Tachikoma' timestamp='1316101764' post='4862108']
Some of the optimisation hacks look interesting and completely crazy. They make sense, but I'd have to question whether doing something like this worth the effort and the maintainability. Just imagine the poor sod seeing such code for the first time - I wouldn't blame him for thinking the original author lost his marbles.
[/quote]

It's very rarely worth the effort.

If you took a medium sized coding project and applied loop-unrolling and instruction interleaving everywhere, chances are your compiled code size will have increased noticeably. At this point there is an extremely good chance that the overall performance of your app would have dropped a LOT! The problem is that memory is normally the biggest bottleneck on modern systems. Increased code size leads to: more memory reads; more page faults; worse instruction cache usage and so on and so forth.

This is why it is utterly pointless to worry about optimisation until a profiler has told you that you have a problem! If appplied sparingly, in the correct place, these optimisations *can* bring about a big performance increases. However, applying them globally is always a recipe for disaster.

There are some things that are almost always worth optimising though....
* Replacing one algorithm with one better suited to your problem space.
* Minimising memory reads via better data structures (eg Oct/Quad/Kd Trees)
* Minimising memory reads via data compression (decompression cost is usually negligable compared to the time saved by not reading from memory)
* Minimising data dependencies (although the compiler usually does a good job, it rarely hurts to give it hints)
* Minimising FileIO times (compression again - fileIO being orders of magnitude slower than memory IO)
* SIMD / Multi-core optimisations (imho, this is not an optimisation, it is "building in efficiency". There are however limits!)

All of the above allow for huge performance increases, without ending up with an unmaintainable codebase.

After that, the law of diminishing-returns applies. For example, interleaving instructions across the codebase may take a few weeks, introduce some bugs along the way, and end up providing only a single extra frame per second for your efforts (on a game already running at 70fps). I don't completely agree with the "Don't optimise early" adage, my personal take is "Build in efficiency, and only optimise when someone can demonstrate you actually have a problem that needs to be fixed! (And that problem is actually affecting the release builds!)"
0

Share this post


Link to post
Share on other sites
[quote name='RobTheBloke' timestamp='1316099364' post='4862094']
[quote name='Ravyne' timestamp='1316029428' post='4861708']
This is true in as much as its obviously a perf-gain if you can keep all the various execution resources as busy as possible. One of the reasons that Quake, and later, Unreal, were able to do things that no one else was doing in software rendering was that they kept the Pentium's U and V pipe both pretty busy, along with MMX where available.

Conceptually, this is easy, but in practice it is very hard -- it requires a knowledge of instructions and compilers that most people don't have, and with out-of-order execution, wider super-scalar architectures, operation fusion techniques, and different issue/resolve latencies for each instruction, its basically impossible for a human to accomplish at anything more finely-grained than, say, the major execution units: load/store, integer, FPU (simple and complex), and SIMD.

Yes, highly performant code should be aware of these things, but no, this is no epiphany that you've bestowed upon the world.
[/quote]

Not sure who marked this post negatively - It's spot on....
[/quote]

Off topic, but yeah I kinda wondered too. Its actually pretty interesting that I never find myself downvoted on the basis of my argument (mostly I suppose because I refrain from making strong statements about things I'm less versed in), but I do seem to anger someone now and again (if I know myself to not be factually incorrect, then I can only assume someone is taking revenge for feeling slighted in some way) -- or maybe someone's finger slipped, in which case the forum software fails for not letting people change their votes.

Its really too bad when that happens though, because someone will come along 6 months from now and assume a post was down-rated for good reason, and so if it was done intentionally, then someone has done some small amount of damage to the community in order to avenge their own bruised ego. My rating is high enough that the effect on me personally is negligable, its the implicit misinformation that upsets me and the fact that I took time out of my day to provide a reasoned answer, and some nitwit just called it into question with the click of a button. Such is the internet, I suppose.


I assume it was you who helped fix it, so thanks. I've also taken on a Robin Hood-like role where I will help right obvious misuse of the rating system.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0