Jump to content
  • Advertisement
Sign in to follow this  
Haytil

Optimizing Read/Write of a Variable

This topic is 3420 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm reading a programming book, in which the following bit of code is written:
if (zi < z_ptr[xi])
{
   // write textel
   screen_ptr[xi] = color;

   //update z-buffer
   z_ptr[xi] = zi;
}

(The code is a basic software implementation of a Z-Buffer for a software rasterizer. This bit of code is being called quite a lot, but other than that, the purpose of the code is, I think, irrelevant) Afterwards, in a side-box, is written: "Notice that even after I determine that the Z-buffer should be updated, I do not update it with the value of zi. This is an optimization trick. It's usually a bad idea to immediately read and then write the same value - better to put somethign in between and then write the value" I presume the author is referring to the fact that the "if" statment reads the value of z_ptr and then writes to screen_ptr before writing back to z_ptr (rather than writing to z_ptr first, before writing to screen_ptr). In other words, he is suggesting that the above code is marginally faster (more optimized), than the following, similar code:
if (zi < z_ptr[xi])
{
   //update z-buffer
   z_ptr[xi] = zi;

   // write textel
   screen_ptr[xi] = color;
}

My questions are: Is this true? If so, why? I don't care if it's "marginally" true, or only true "some of the time." And I realize that little optimizations like this are generally irrelevant and should be put off till the end (more readable code is better than marginally faster code, and most optimizations that will make a difference are algorithmic in nature anyway). If this optimization IS real or possible, then please tell me - and explain WHY. Also, if this optimization IS real or possible, is this the kind of thing that the compiler will automatically handle during internal optimization? I'm very interested in people's explanations and thoughts. Thank you. -Gauvir_Mucca

Share this post


Link to post
Share on other sites
Advertisement
1) Write-after-read, as that process is called, is more of a problem in-order processors. On out-of-order processors it's not nearly as big a penalty.

2) Yes, compiler can (and many will) do such optimizations for you. It's basic instruction scheduling.

Technically that is not write-after-read however, there is a comparison in between the two, but the above still applies.

Share this post


Link to post
Share on other sites
Quote:
Original post by Gauvir_Mucca
My questions are: Is this true? If so, why?


It can be true on processors created in the last X years that this may make a difference. It's not something that can be easily explained in much detail, because modern processors are very complex. But the general idea is that the processor has a limited ability to work on multiple things at a time, but it can't start work on "read address A" if work on "write address A" is in progress, because otherwise the value that it's supposed to read won't be there yet. And that means that it has to drop everything it's doing, wait for the read to finish, and then start the write (and thereafter resume operating in full swing).

However, while this has been true for the last X years, it has also been true for at last X years that the order in which things are described in your C++ code will NOT be preserved in the generated assembly. In fact, I would bet that it's been true for at least 2X years. And since compiler writers can generally be assumed to be operating in good faith to make your code faster ;) (at least in release mode :) ), you can be sure that if the code "suggested" by the first version is faster, then the compiler will produce that faster code if you supply either version of the source - certainly for examples this trivial, anyway.

Caveat: The compiler might not be able to prove that this is safe, and thus err on the side of caution, if it can't prove (for example) that screen_ptr and z_ptr refer to non-overlapping areas of memory.


Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!