This topic is 5144 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello I am coding a DirectDraw Alpha Blend function using MMX and about the blend function is is working great, it handle 16 y 32 bit at this momment, nevertheless I got a really STRANGE error which I must say, I never seen it before In my blend function I use a switch/case to handle color depth so it choose the correct blend function as follows: switch (bpp) { case 16: Blend16(....); break; case 32: Blend32(.....); break; } The strange thing is if I use the above code my 16-bit functions SLOW DOWN A LOT, for example from 30 fps to 18 fps, BUT if I just comment the Blend32 line it WORK AT FULL SPEED AGAIN!. so as you can see I see no relation between 2 functions so dont really know which could be causing this behavior. Any help about it will be appreciated, Oscar

##### Share on other sites
If that's all the code (i.e. no other cases or code in the existing cases) then I can only think it's a branch overhead that is slowing your code down. I assume this is executed for each group of pixels in the image (you said you were using MMX). In which case, when you comment out the Blend32 function call, the compiler is optimising the code to this:
if (bpp == 16){    Blend16 (...);}

and the CPU never takes the branch. Branching on modern CPUs is very expensive as it has to restart the instruction pipeline when a branch is taken (OK, with prediction it's a bit more complex as to when the pipeline is restarted), the longer the instruction pipeline the higher the cost of a branch. With the Blend32 call present, the CPU is always branching for each execution of the switch statement and thus slowing down.

Solutions:
1) If the arguments are the same for the two functions, you can use a function pointer:
// before image processing startsswitch (bpp){case 16:    blend_function = Blend16;    break;case 32:    blend_function = Blend32;    break;}// in your image processing loop (where your switch is now)blend_function (...);

If you're using C++, you could use a polymorphic object instead. This can overcome the problem of different arguments to the blend functions.
2) Do the switch at a higher level and duplicate the image processing code:
switch (bpp){case 16:    // blend two 16 bit images together    break;case 32:    // blend two 32 bit images together    break;}

This isn't as good as (1) since you're duplicating the main processing loop (twice the maintenance).

Skizz

##### Share on other sites
Hi,

Just saw that Skizz gave a good explanation of the problem.

If you're about performance, You'd better do the switch at a higher level - even if this actually increase the code size, and may increase the bug count. It will allow you to avoid a lot of conditionnals. You can also try to factorize your inner loops into a template class that will use functors. Not really easy to design but it may allow you to write the loop code only once. For example:

template < class BlendFunctor, class AnotherFunctor > class InnerLoop{public:    void operator()    {        BlendFunctor bf;        AnotherFunctor af;        while (IMustRunTheLoop)          bf(SomeLoopBasedParams);          af(SomeLoopBasedParams);        }    }};// later in the code...InnerLoop<Blend16MMXFunctor, FooFunctor> Blend16InnerLoop; InnerLoop<Blend32MMXFunctor, FooFunctor> Blend32InnerLoop; if (bpp == 16) {    Blend16InnerLoop();} else if (bpp == 32) {    Blend32InnerLoop();}

Not sure I'm clear in my description of this particular design trick. The goal is to only write one version of the outer loop and then to use compilte-time polymorphism to have any outer loop you want (some kind of compile time software shaders in fact). It will allow you to get the best of all worlds : code maintenance is still (relatively) easy, and you may be still able to inline you code.

Another solution should be to create a software shader on the fly (using Nicolas Cappens' softwire library (the last version is... wow...)).

Using a function pointer will not allow the compiler to inline the function calls. This is basically the same as using a virtual function (VC++ adds a 2 asm instruction overhead on a virtual function call).

Regards,

##### Share on other sites
Hello

Thanks for your reply and and deep explanation, it help me a lot and give me a lot of things to tink about it.

And about my trouble, the use of functions pointer SOLVED IT!, I really never guessed about this troubles before.

For last, could you develop a little more about doing the switch at a higher level, you mean, put all code inside the switch (1 block for 16 blend, other one for 32 blend)?; because I tried it and the result was the same as calling Blend16 and Blend32.

Thank you very much,
Oscar

##### Share on other sites
You can do something like that:
template<int bitdepth> BlendImages(your parameters){
your code that uses case for bithdepth.
}
Then you can,perhaps, create aliases for BlendImages<16>(your parameters), BlendImages<32>(your parameters), etc.

Compiler,unless it's very bad compiler(i guess it's hard to find so bad compiler), will remove branching. When templates is "expanded" by compiler, bitdepth works as constant known at compile time.

as about idea with pointers-to-functions, them have not very good performance.(worse if it's "functors") . I actually used that in my own code, in Pascal where i had no templates.

Doing branching on higher level, that is, instead of
someloop{
..
swich(bitdepth){
case 16:
one thing
break;
case 32:
another thing
break;
}
..
}

you do
swich(bitdepth){
case 16:
someloop{
...
one thing
...
}
break;
case 32:
someloop{
...
another thing
...
}
break;
}

##### Share on other sites
Hi, thx for your reply. I will definitively give a try to the templates implementation that you suggest me.

PS: And about the compiler I am using VC++ .Net 2003

Regards,
Oscar

##### Share on other sites
you can use both templates and function pointers-
if you have function pointer variable BlendImages,
you can set it to &templ_BlendImages<16>; or to &templ_BlendImages<32>
depending to mode.

Also you can make class
Graphics
with virtual functions for different things, and related data, and some functions that uses virtual ones (for instance, DrawFilledPolygon,DrawFilledRectangle,DrawFilledCircle can be done via calls to DrawScanline),
and derived classes
Graphics15,Graphics16,Graphics32
where you can have different functions for different modes.

And make function GetGraphicsContent(bitdepth, resolution,etc)
that returns pointer to Graphics class you want to use, and function DestroyGraphicsContent() that destroys it.

So you'll have nice OOP graphics class, and you'll be able to adapt it to work with HW acceleration if you'll need.

##### Share on other sites
Hi, thank you all ppl for your help, finally got it working great using a mix of template/function pointers.

Best Regards,
Oscar

1. 1
Rutin
37
2. 2
3. 3
4. 4
5. 5

• 11
• 10
• 13
• 103
• 11
• ### Forum Statistics

• Total Topics
632976
• Total Posts
3009672
• ### Who's Online (See full list)

There are no registered users currently online

×