Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

spade

Bliting / Locking bits done fast !?

This topic is 5599 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everybody. I need a very fast way of accessing the pixels/bits on a surface/rendertarget. For now I use something like: DWORD dw = *(reinterpret_cast(lrec.pBits) + y * displaymode.right + x); But the typecast increases time for rendering about 200%... (I have to do it in a loop :-/ ) Is there a faster way ? Maybe just getting the values of the bits? Instead of copying ? Regards, Tim

Share this post


Link to post
Share on other sites
Advertisement
Don''t lock the render target! - it''s usually in true video memory so is in the slowest place for the CPU to actually access memorywise, and isn''t cached (especially bad for ANY read) apart from a little bit of AGP fast write/burst for sequential access.

Use hardware accelerated blits wherever possible (i.e. those provided by DirectDraw) since you get super fast performance going from video to video and usually DMA support going from system to video. Video to system is usually always slow, but the "proper" blit usually will be the most optimal you''ll find!

The fastest surface type to lock and fiddle with pixels on is a SYSTEM memory surface since its memory is local to the CPU so cached etc which makes a HUGE difference.

If I were making an engine which needed to fiddle with pixels (for alpha effects or something I presume) I''d use video to video hardware blits for as much as possible AND keep a copy of all surface data in system memory.
Then for the parts when need per-pixel fiddling, I''d recreate the portion of the render target you would normally lock in SYSTEM memory using the copy of the graphics in system memory and then do your lock & fiddle on that surface, finally I''d blit from that surface to the desired location in video memory. The idea is a) no video memory access with the CPU, b) no slooow video->system transfers, c) DMA assisted system->video blit

Also only ever lock the area of the surface you need to fiddle with - no more.


BTW: I''m assuming DirectDraw and a totally 2D app here - if you''re using Direct3D and wanting to lock a texture "in frame" or a render target then I''d say there is a flaw in your design somewhere unless you''re doing something highly original. If it''s to render a crosshair or target dot in D3D - use a textured quad - much faster.

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Hi,
thanks for your reply.
My app is 3D, and there is nothing I can do about
and I just have to _read_ form the surface. Is it therefore still faster to keep a copy in SYSTEM RAM ?

What about copyrect''ing the interessting pixel to another surface (witch is not the rendertarget) and read form there ?

I need the average over a region of pixels.


Regards,
Tim

Share this post


Link to post
Share on other sites
Reading from video memory is the SLOWEST thing you can do - READS are the actual reason for the system mem copy!

copying from video to system using an official API call is probably better since at least you take advantage of special knowledge the driver has about that memory. But it''s still the slowest thing to do.

You can do some very interesting things with render targets which can often solve many reasons to read (blur effects, object/pixel occlusion etc).

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Hi,

maybe I''m doing something like writing the sum to another texture. But then I still have to read it out... :-(

Thanks for your help so far!

Regards,
Tim

Share this post


Link to post
Share on other sites
A couple things to keep in mind for best performance:

1) You want to do as many operations as possible on the GPU. If you attempt to read back data from the GPU, it must finish all of its rendering operations first. While you''re reading data, the GPU could be sitting idle.

2) If you *must* read data back from the GPU, you want to read as little as possible. If you must have the average pixel value from a rendered scene on the CPU, you should do the averaging using the GPU and render target textures. Sample a 2x2 region from the RT texture, average the four values, and write it out to another RT texture of half the width and half the height. There are some tricks where you can average more than a 2x2 region. Run this process several times until you''re down to a small RT texture, 16x16 or less. At this point, averaging to 1x1 and reading the data would cost more than reading the 16x16 data. You''ll have to measure the performance to find out where to stop. You can then use CopyRects or GetRenderTargetData to read the surface back into system memory.

3) Even if the computation is inefficient if run on the GPU, it may still be faster than reading the data into the CPU. It could be faster to reduce the RT texture to 1x1, set it in a texture stage, and then sample it when you need the average pixel value.

Share this post


Link to post
Share on other sites
Hi,

thats what I will try next, thank you.

Do you have any Ideas how a geforce4 interpolates textures when they are zoomed (z-based) ?

I thougt about scaling my original texture, which I want to avarage, down till its only 1x1. Isnt the interpolating-algorithm the exact thing I am looking for ?!

Regards,
Tim

Share this post


Link to post
Share on other sites
Hi,

yes I''am. ;-)

No, I'' am sorry, thats not my reason. My actual problem is, I have a texture (unfortunatly the RenderTarget) and desperatly need some Rects of this texture avaraged and I need the color value of the result. I don''t see a way of using mipmapping for this, or did I get it completly wrong ?

So my last idea was to scale those parts of the texture down to one pixel. Does anyone know in which way this is interpolated ?!

Another idea was to use different textures so that I can add the values and divided them by 2. I know this isnt exactly an average, but it would be good enough for me. Since I would like to do this in a pixelshader version 1.2 there is no way of dividing it by the real number of pixels of the requested region. (or maybe with setting the shaderconsts to the reciprocal value).

Regards,
Tim

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!