Jump to content
  • Advertisement
Sign in to follow this  
3Dgonewild

CPU cache chunks and gcc function align flag

This topic is 3430 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Haven't asked stuff for a while , but here it goes :)... I've been doing a little research on gcc optimization options , and i've been wondering what exactly happens when i set a custom function align level. For example , if we had a cpu with 256kb cache divided into 32byte cache chunks(lines) , it would be efficient to set function align flag to 32 ? Also , are there any other cpu cache specific optimization flags that i should know? Thanks for your time :).

Share this post


Link to post
Share on other sites
Advertisement
I guess it won't make any difference, but if you want to know for sure you'll have to profile.

Furthermore, I doubt that you'll find any magic optimization flags, besides the basic ones, that will give you noticable performance gains. It's much more efficient to find the hotspots through profiling and then optimising the code there.

Some (most?) CPUs have a set of cache control instructions which may or may not be exposed as compiler intrinsics. So if you have a piece of code that would benefit from better use of the caches, use those instructions. It's most likely not worth it though...


Share this post


Link to post
Share on other sites
Well you're right about profiling the code , but i was just asking for some more information on the subject , but it seems that no one actually did some research before.

I haven't found a document that explains how a function is aligned.
Eg: the variable members get aligned by N bytes ?
If that's the case , then obviously on some platforms its possible to get interesting results(especially when dealing with large objects).

Share this post


Link to post
Share on other sites
In my experience, any playing around with alignments and cache prefetching makes your code run equally fast in the best case and slower in the average case.
There may be cases where your code will perform better, but like Rattenhirn already said, you won't find those without profiling.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!