Sign in to follow this  
Stowelly

Editing SDL's pixel data directly (pixel / colour format)

Recommended Posts

Hi, im currently writing a software rasterizer and using SDL literally as a framebuffer. all has been going really well and Ive got some complex flat shaded geometry running smoothly. now im trying to implement gourand shading and per vertex lighting meaning i need to be able to adjust the colour at run time, where as before id been choosing my colour in rgb at startup and converting to a uint using SDL_MapRGB. in order to do this for every pixel at run time is obviously very costly, so does anyone have any ideas how i can manipulate the rgb values without this overhead? many thanks

Share this post


Link to post
Share on other sites
Have kind of solved this now, so will post the result incase it aids anyone else.

would appreceate some feedback off of people if they know any way of optimising this in any way possible, as obviously the slightest performance overhead on a per pixel basis will be rediculously costly

unsigned int COLOUR = get_colour();

SDL_PixelFormat *fmt;
fmt = m_screen->format;
int bpp = fmt->BitsPerPixel;
int temp = colour & fmt->Rmask;
Uint8 r = temp >> fmt->Rshift;
temp = colour & fmt->Gmask;
Uint8 g = temp >> fmt->Gshift;
temp = colour & fmt->Bmask;
Uint8 b = temp >> fmt->Bshift;


and the reverse process to turn rgb uint8's into 32bit pixel format

unsigned int col =0;
col += r << fmt->Rshift;
col += g << fmt->Gshift;
col += b << fmt->Bshift;

[Edited by - Stowelly on September 13, 2009 10:20:41 AM]

Share this post


Link to post
Share on other sites
You should be able to optimize a little bit by avoiding extra variables. I don't know how much of an effect this will have if you have the compiler's optimizations turned on.

This is supposed to be the way to do it, according to the SDL_PixelFormat documentation:
Uint32 temp=color&fmt->Rmask; /* Isolate red component */
temp=temp>>fmt->Rshift;/* Shift it down to 8-bit */
temp=temp<<fmt->Rloss; /* Expand to a full 8-bit number */
Uint8 r=(Uint8)temp;


I'd change that to this:
Uint8 r = (Uint8)(((color & fmt->Rmask) >> fmt->Rshift) << fmt->Rloss);


Going the other way around, I'd do this:
Uint32 color = (r << fmt->Rshift) | (g << fmt->Gshift) | (b << fmt->Bshift) | (a << fmt->Ashift);


Of course, SDL leaves the pixel formats up to the user. If you're designing something that could use speed at the cost of flexibility, then I would suggest forcing one particular format. For example, require that your colors are in either 32-bit AARRGGBB format or 24-bit RRGGBB format. Then you can optimize a bit more by knowing the numbers already. Something like this:
Uint32 color =  (((((a << 8) | r) << 8) | g) << 8) | b;


That would mean running everything through something similar to SDL_DisplayFormat<Alpha> so that you're sure that the format is right... but you should be using SDL_DisplayFormat for everything if you're using SDL's renderer anyhow. You do have to make sure that the display format matches your ideal format somehow, or else you're just converting in vain.

I suggest you look at SDL's source code for SDL_MapRGB and SDL_GetRGB so that you get a feel for how they do it and see if it's even worth the effort. Be sure to run some speed tests!

I don't guarantee that any of this code works. In fact, it might even be counterproductive to take this approach, but I hope it makes you think.
Jonny D

Share this post


Link to post
Share on other sites
Quote:
Original post by Stowelly
Hi, im currently writing a software rasterizer and using SDL literally as a framebuffer. all has been going really well and Ive got some complex flat shaded geometry running smoothly.
If you're using SDL_Surface (my experience with SDL is somewhat spread across many years) you may be interested in knowing that on some very low-end machines coupled with both recent and not-so-recent drivers this renderpath seems to somehow trigger a low performance driver behaviour. How much low? The overhead of framebuffer locking alone is several times the logic (say about 20x) and even more than the rendering itself (say 5x) for a 640x480 window...
This makes for example the application use just about 30% CPU on a Atom270 netbook and around 80% on a 2400Mhz athlon 64... go figure what's happening.

Since you're writing a software renderer you'll never going to get hi-perf anyway and probably you don't really care, but I have the feeling SDL doesn't take framebuffer locking seriously (come on, they don't have even Lock flags! In 2009!).
Quote:
Original post by Stowelly
in order to do this for every pixel at run time is obviously very costly, so does anyone have any ideas how i can manipulate the rgb values without this overhead?
Just lock the whole framebuffer?

Using temporary variables is essentially for free in my experience and delivers a way more friendly debugging environment. Shuffling instructions is your compiler's job not yours. Removing temporary variables is also compiler's job.

When in doubt, run benchmarks BEFORE spending your time. I have been here. It wasn't fun.

Quote:
Original post by grimfang4
Of course, SDL leaves the pixel formats up to the user. If you're designing something that could use speed at the cost of flexibility, then I would suggest forcing one particular format.
I agree.

Also make sure your per-pixel func calls are inlined. Modern processors (I mean, everything that is at least 200 Mhz really) are hammered by function calls.

Share this post


Link to post
Share on other sites
Quote:
Original post by grimfang4
You should be able to optimize a little bit by avoiding extra variables. I don't know how much of an effect this will have if you have the compiler's optimizations turned on.

This is supposed to be the way to do it, according to the SDL_PixelFormat documentation:
*** Source Snippet Removed ***

I'd change that to this:
*** Source Snippet Removed ***

Going the other way around, I'd do this:
*** Source Snippet Removed ***

Of course, SDL leaves the pixel formats up to the user. If you're designing something that could use speed at the cost of flexibility, then I would suggest forcing one particular format. For example, require that your colors are in either 32-bit AARRGGBB format or 24-bit RRGGBB format. Then you can optimize a bit more by knowing the numbers already. Something like this:
*** Source Snippet Removed ***

That would mean running everything through something similar to SDL_DisplayFormat<Alpha> so that you're sure that the format is right... but you should be using SDL_DisplayFormat for everything if you're using SDL's renderer anyhow. You do have to make sure that the display format matches your ideal format somehow, or else you're just converting in vain.

I suggest you look at SDL's source code for SDL_MapRGB and SDL_GetRGB so that you get a feel for how they do it and see if it's even worth the effort. Be sure to run some speed tests!

I don't guarantee that any of this code works. In fact, it might even be counterproductive to take this approach, but I hope it makes you think.
Jonny D



Thank you, this makes a lot of sense

Share this post


Link to post
Share on other sites
Quote:
Original post by Krohm
Quote:
Original post by Stowelly
Hi, im currently writing a software rasterizer and using SDL literally as a framebuffer. all has been going really well and Ive got some complex flat shaded geometry running smoothly.
If you're using SDL_Surface (my experience with SDL is somewhat spread across many years) you may be interested in knowing that on some very low-end machines coupled with both recent and not-so-recent drivers this renderpath seems to somehow trigger a low performance driver behaviour. How much low? The overhead of framebuffer locking alone is several times the logic (say about 20x) and even more than the rendering itself (say 5x) for a 640x480 window...
This makes for example the application use just about 30% CPU on a Atom270 netbook and around 80% on a 2400Mhz athlon 64... go figure what's happening.

Since you're writing a software renderer you'll never going to get hi-perf anyway and probably you don't really care, but I have the feeling SDL doesn't take framebuffer locking seriously (come on, they don't have even Lock flags! In 2009!).
Quote:
Original post by Stowelly
in order to do this for every pixel at run time is obviously very costly, so does anyone have any ideas how i can manipulate the rgb values without this overhead?
Just lock the whole framebuffer?

Using temporary variables is essentially for free in my experience and delivers a way more friendly debugging environment. Shuffling instructions is your compiler's job not yours. Removing temporary variables is also compiler's job.

When in doubt, run benchmarks BEFORE spending your time. I have been here. It wasn't fun.

Quote:
Original post by grimfang4
Of course, SDL leaves the pixel formats up to the user. If you're designing something that could use speed at the cost of flexibility, then I would suggest forcing one particular format.
I agree.

Also make sure your per-pixel func calls are inlined. Modern processors (I mean, everything that is at least 200 Mhz really) are hammered by function calls.


with regards to SDL's locking ebign rather slow, can you think of a windows method of using the frame buffer directly.... similar to how you would on Linux without a X server running, as this would have been more of an ideal solution for me. would like to use as few third party apps as possible, and hopefully be able to port it to ps3 linus using just the frame buffer?

Share this post


Link to post
Share on other sites
On those problematic systems, everything has turned out to be faster. Even GL with WritePixels on standard framebuffer (!). The good news is that this seems to happen only on quite old machines (like single cores, or DDR1 based, or AGP based), which hopefully will be phased out soon - too bad some of those systems can run Doom3 with no sweat. All tests I've ran to pinpoint the problem failed in suggesting a solution I don't even know if it's the driver, or some chipset family, or what...
I just wish you to never meet those users, but if does happens, just remember this.
For machines without this issue, the standard locking mechanism does just fine - netbooks are safe and I suppose you don't want anything less. Just benchmark/profile and if the results are pleasing, you're done with it!

Share this post


Link to post
Share on other sites
Thank you for making me aware of this. it seems that after profiling my app, 20% of the time is spent in SDL_FillRect (to clear the buffer), and another 30% spent in the windows function BitBlt..... quite a significant amount more than most of my other functions (although my rasterize function takes up about 40% which is to be expected)

im running a core 2 duo by the way. ive not profiled it on my netbook yet.

maybe i should look into techniques of doing the lock - fill - flip process myself?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this