Jump to content
  • Advertisement
Sign in to follow this  
Wavarian

Any MMX gurus?

This topic is 3781 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey guys, I'll admit that I'm being lazy here, but sometimes it's easier just to ask someone that already knows the answer than to wade through pages and pages of documentation. From what I've already read about MMX, I think I may have found a use for it in my own code. I'm hoping that someone out there would be able to come up with a quick solution to solve the problem. I have a 32-bit pixel, and I want to multiply each component with a 32-bit integer, and then divide the result by a 32-bit integer:
int x = ...;
int y = ...;

unsigned char new_r = (unsigned char)((old_r * x) / y);
unsigned char new_g = (unsigned char)((old_g * x) / y);
unsigned char new_b = (unsigned char)((old_b * x) / y);
unsigned char new_a = (unsigned char)((old_a * x) / y);
If I could somehow load two pixels into an __m64 variable, is there some way that I can perform these operations all at the same time? Cheers

Share this post


Link to post
Share on other sites
Advertisement
I am working on something very much like this.

What I am trying to do is speed up my alpha blitting code by using SSE.
BTW, why are you using MMX and not SSE?

One of the problems is how to arrange the data so that it is easy to do the needed operations.

What I did (using SSE) is to load each colour into a separate register:
XMM0 = red part of pixel 1 to 4
XMM1 = blue part of pixel 1 to 4
XMM2 = green part of pixel 1 to 4
XMM3 = alpha part of pixel 1 to 4

Then do the math on that.

But there is probably better ways, Ill let you know if I find something clever as I am still working on my code

Share this post


Link to post
Share on other sites
Another way would be to "load1" the integer to multiply with into another register for the multiply. That way you don't need to SoA your data. Most programmers are much more comfortable with SoA, and many algorithms and most hardware are much more comfortable with or implemented that way, too. For example, you probably won't find a video card that stores the red, green, blue, and alpha channels separately, and you probably won't have much luck trying to modify a texture that way. That's one reason why MMX/SSE sucks so much except for maybe 0.01% of all applications. </rant>

Anyway, you'll have a hard time doing that integer division.
To my knowledge, there is no such thing as integer division in either MMX or SSE(2, 3,...). You might use shifts for power-of-two, or do some multiplicative inverse trickery. Or, you might convert to float/double and back, but either solution pretty much makes using MMX/SSE in the first place absurd performance-wise. :-(

Share this post


Link to post
Share on other sites
SoA = Structure of Arrays
AoS = Array of Structures

So SoA would look like this

struct Pixels
{
unsigned char red[128];
unsigned char green[128];
unsigned char blue[128];
};

Where as AoS would be

struct Pixel
{
unsigned char red;
unsigned char green;
unsigned char blue;
};

Pixel pixels[128];

Hope this makes sense.

Share this post


Link to post
Share on other sites
The first optimization I'd look at with that code is not MMX - it's converting that multiply and divide into a multiply and shift. You probably also only need 16 bits of accuracy since you're working with 8-bit values.

Something like:


// Calculate this once. Assumes x * 256 doesn't overflow.
const unsigned int mul = (x * 256) / y;

// Do this per pixel.
unsigned char new_r = (unsigned char)((old_r * mul) >> 8);


After that optimization the conversion to MMX / SSE2 instructions should be much simpler. However you'll have to do the calculation with 16-bit values so you won't fit 2 pixels into one MMX register.

Also SOA vs AOS won't make any difference here - you're doing the same calculation on every byte of the data.

Share this post


Link to post
Share on other sites
Quote:
Original post by Adam_42However you'll have to do the calculation with 16-bit values so you won't fit 2 pixels into one MMX register.


though if you use an XMM register you can process 16 uchar's (4 pixels) in one go.....

Share this post


Link to post
Share on other sites
Quote:
though if you use an XMM register you can process 16 uchar's (4 pixels) in one go.....

From my understanding the SSE commands works in blocks of 4 or 2. Meaning that if you want to use say _mm_mul_() you would have to have your data sorted in this way:
2 64bit data
4 32bit data

Anything else needs to be padded so it fits that system. Again, I migth be wrong as im still trying to learn this myself. If someone could verify this, that would be great (until someone does, assume its wrong...)

BTW:
RobTheBloke from cgtalk.com?

Share this post


Link to post
Share on other sites
Quote:
Original post by AndyPandyV2
It can also work with 8 shorts or 16 bytes, there are SSE2 instructions for doing so.


Can you please give a few examples of commands that does that or a link? I have been looking for something like that for a long time, but have not found anything.


Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!