Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About dst

Personal Information

  • Interests
  1. The problem I still need to do the swap before calculating indices into the fill buffer, since I need to multiply the right y indices with the width of the buffer. The swap seems to stall the CPU. If I switch it to a branchless version, then it blocks speculation. If I put a branch there, then speculation is better, but I get mispredictions...
  2. Yeah, it was an abbreviation of my actual code, so should have checked that I wrote it down correctly. I've managed to cut down the runtime of an inner loop by 20% with lots of bit fiddling. My current problem is port congestion in this function, which is caused by emitting tons of bit fiddling instructions. Some other obvious issues remain like the sequence of code. For example, if I want to do: if (x1 < x0) std::swap(x0, x1); int b = x0 < 0; Technically, I could get rid of the delay from the swap by doing the equivalent: int b = (x0 < 0) | (x1 < 0); if (x1 < x0) std::swap(x0, x1); However, this adds unnecessary instructions. It would be cool to somehow compute the boolean with the lesser amount of instructions above, but without the penalty of the extra instructions in the version below.
  3. I have profiled it extensively and it's spending most of its time right here. The rest is just a set of tight loops in increasing memory order, so the CPU runs it quite well. This is why I'm asking. I've also played around with the same code on a few years old HW and it shows the same issues (I wanted to see what VTune has to say).
  4. I'm reviving some of my graphics programming abilities by playing around with some old HW of mine and trying to do graphics programming without any external libraries. I'm getting stuck implementing a fast enough fill routine to to draw overlaps of objects. I'm letting the method figure out which is on which side, so the corner coordinates aren't sorted and might move outside of the image and needs to be clipped. The result is therefore: void fill(int x0, int y0, int x1, int y1, int color) { if (x1 < x0) std::swap(x0, x1); if (y1 < y0) std::swap(y0, y1); if (x1 < 0 || y1 < 0 || x1 >= 1024 || y1 >= 768) return; x0 = x1 < 0 ? 0 : x0; x1 = x1 >= 1024 ? 1023 : x1; y0 = y0 < 0 ? 0 : y0; y1 = y1 >= 768 ? 765 : y1; // Rest of code... } The problem is that the inputs are pretty randomly ordered etc. so the branches aren't well predicted. I assume that there must be some neat tricks for optimizing this chunk of code. I've tried various tricks that compile to SAR for the clamping, but if I convert the code line by line to bit fiddlings the code gets significantly longer and actually executes worse. I assume this is a pretty well known method to implement efficiently and there must be some old tricks for speeding it up significantly?
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!