Jump to content
  • Advertisement
ahmadierfan99

C++ Benchmark for Search Algorithms in Common Games Development Data

Recommended Posts

Advertisement

 

Quote

 When the time comes that you need to make changes

I heard you, I couldn't agree more.

I completely ignored that you have to profile your app constantly and make changes; Because deep down in my mind I had to start writing everything the best version which is pure nonsense and impossible.

I have other questions for you about your experiences on the projects you've worked on

Since you might have had to ship your application on multiple hardwares and operating systems, 

1. Which Hardware(s) did you run your app on and how important was that? 

I ask you this because in your previous replies I understood an algorithm performance and superiority to other algorithms heavily depends on the hardware. 

2. How did you budget your "subsystem's updates"?

3. Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

Share this post


Link to post
Share on other sites
Posted (edited)
10 hours ago, ahmadierfan99 said:

2. How did you budget your "subsystem's updates"?

If you code it yourself, the value is easy to guess.     

For example, if physics engine's update usually takes 2-3 ms in current version of my software, it is reasonable to create an assert fail / warning if it will start to takes 6 ms in future versions.     

Another example : In my situation, it is reasonable to put that a graphic engine should take no more than 40% of overall game.    

Drastically (but not unfortunately), it is an art.  I may want to be on a safe-side for an expensive system and put a low threshold number that will warn fast, or I may feel lucky and use a CPU% number = twice of the version that everything looks perfectly fine.   The numbers can also be changed later.

My values are based on feeling, approximation, experiment and experience .... like ingredient ratio I used when cook food.

Taste it ... ahh, I mean test it and you will find a figure that you are satisfied with ... I start to be hungry.  

 

Edited by hyyou

Share this post


Link to post
Share on other sites
On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

Which Hardware(s) did you run your app on and how important was that? 

I ask you this because in your previous replies I understood an algorithm performance and superiority to other algorithms heavily depends on the hardware. 

The final hardware is usually minimally important.  Most of the things people fear turn out to be unimportant when it comes to performance.  People spend a lot of effort (and fear) that an algorithm won't work well on some specific system, but in practice, usually it isn't an issue at all.

But sometimes the hardware is critically important. Sometimes there are things on the critical performance path and one system happens to have radically different performance than the others.  You can ONLY determine that with careful measurements. It usually cannot be figured out in advance.

Many factors of algorithms do matter in general.  First, not doing work is faster than doing work.  Doing less work is generally faster than doing more work.  Doing a small amount of work up front to prune out an enormous amount of work later is generally a good idea.   Said differently: Don't be stupid.  If you have a choice between using quicksort in a library versus rolling your own bubblesort, use the quicksort.  Don't take pessimizations nor follow known problems.

On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

How did you budget your "subsystem's updates"?

You already know how long you've got in total.  If your target is 60 frames per second from the old television standards (which are long outdated) you have about 15 milliseconds total. If your target is 72 or 75 or 120 or 144 or even the 288 Hz that some high-end gamer displays are pushing, you can calculate how many microseconds you've got for your target.

From there, start with a measurement of where it is now.  If you run all your updates in a single thread, you know that simulating + rendering + networking + audio + whatever must be less than the target of 15 milliseconds. If you measure today and see that simulating is taking 4 milliseconds, rendering is peaking at 6 milliseconds, audio is peaking at 1 millisecond, networking is peaking at 3 milliseconds, then you have (4+6+1+3=14 milliseconds) so you know you're at risk of hitting a 15 millisecond target if everything hits a peak frame.  But if you're looking at 7 or 6 or 10 milliseconds when you combine all your times, make those your limits.

Over time as you gain experience, you learn that one system or another will become slow.  Pathfinding has the potential to be slow. Many simulation steps like physics updates have the potential to be slow. Measure and monitor, then make decisions based on the details of that specific game.

On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

CPU instruction times haven't been a thing on modern hardware for about two decades.  There are too many inter-dependencies on surrounding code.  Multiprocessing context switches, hardware parallelism, hyperthreading, out-of-order cores, and long CPU pipelines all combine to make this nearly useless.

Yes, there are times when people need to look at the raw assembly code.  It is fairly rare, getting rarer every day, but it still happens.  It usually isn't necessary until you hit very large professional games trying to fine-tune a specific segment of code. It best to swap out with better algorithm choices, but sometimes you hit the point where there are no better algorithms and you need to hand-tune the details even more.  

Normally there are better options, different algorithm choices, and different broad design choices that are more effective at finding performance gains.

Share this post


Link to post
Share on other sites
8 hours ago, frob said:

 

On 5/5/2019 at 5:10 PM, ahmadierfan99 said:

Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

CPU instruction times haven't been a thing on modern hardware for about two decades.  There are too many inter-dependencies on surrounding code.  Multiprocessing context switches, hardware parallelism, hyperthreading, out-of-order cores, and long CPU pipelines all combine to make this nearly useless.

Yes, there are times when people need to look at the raw assembly code.  It is fairly rare, getting rarer every day, but it still happens.  It usually isn't necessary until you hit very large professional games trying to fine-tune a specific segment of code. It best to swap out with better algorithm choices, but sometimes you hit the point where there are no better algorithms and you need to hand-tune the details even more.  

Normally there are better options, different algorithm choices, and different broad design choices that are more effective at finding performance gains.

The trend over the past 20 years has been to try and move the biggest performance bottlenecks to the GPU, as GPUs are better at pushing lots of data. It is very easy to make games that are entirely GPU bound.

When it comes to CPU instruction timings, you are more likely to get concerned and pay attention to this when using SIMD to optimize a bottleneck. With SIMD comes an appreciation of alignment and cache misses. But SIMD isn't a magic bullet, usually the biggest determinant of speed is memory access. SIMD might increase an ideal situation by 2 - 8x, but better data access might improve speeds by 100-1000x.

Memory access is why for example, vectors work much faster in the real world in most situations compared to linked lists, even though 'in theory' a linked list might do less work on an insertion / removal. Processors work much faster on linear data, compact in memory, hence the move towards data orientated design, things like structure of arrays versus array of structures, ECS etc.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!