Sign in to follow this  
Michael Tanczos

Fast Direct3d MJPEG compression

Recommended Posts

Michael Tanczos    5681
Does anybody know of a fast (fast!) way to encode direct3d surfaces into jpegs? With slimdx and c# I tried texture.tostream and that seemed pretty slow.. This was also pretty slow and I didn't even get to the jpeg compression yet. Getrendertargetdata clocked in at 15ms on my computer:
_device.GetRenderTargetData(_prvsurface, _prvsurface_tmp);

DataRectangle g = _prvsurface_tmp.LockRectangle(LockFlags.None);

System.Drawing.Bitmap output = new Bitmap(_prvsurface.Description.Width, _prvsurface.Description.Height, 4 * _prvsurface.Description.Width, System.Drawing.Imaging.PixelFormat.Format32bppArgb, g.Data.DataPointer);

_prvsurface_tmp.UnlockRectangle();


Adding this brings the clock up to 40ms, so the GDI+ encoder is pretty slow: System.IO.MemoryStream stm = new System.IO.MemoryStream(); output.Save(stm, System.Drawing.Imaging.ImageFormat.Jpeg); Any other ideas for fast(er) jpeg compression of a surface?? - Michael Tanczos

Share this post


Link to post
Share on other sites
Nik02    4348
If you rendered the data at the same frame within which you are retrieving it, the GPU needs to flush all rendering operations on that target immediately in order to lock it for you to read. This will break the parallelism between CPU and GPU. 15 ms sounds roughly like 1/60th of a second, which in turn sounds like a common refresh rate.

One way to rectify this is to use several buffers - while you're rendering to one of them, you can lock an another one for which the rendering has already finished. Modern hardware can render from 1 to 4 frames in advance.

In any case, you probably need some kind of work queue approach.

Also, a high-tech solution would be to implement the compression directly in the GPU by using a compute shader, and read the compressed data back to the CPU :) However, this will limit the target audience somewhat, and represents a considerable amount of work.

Share this post


Link to post
Share on other sites
ET3D    810
I think that the fastest way to encode D3D into MJPEG would be on the GPU. Now, getting that implemented may not be trivial, but it looks like some people have done it. For example, the university of Oslo apparently runs a competition to compress MJPEG as quickly as possible (see here and here). Googling "gpu jpeg compression" also gets some results. Couldn't find any working code, although NVIDIA's site has code for DCT in CUDA.

Share this post


Link to post
Share on other sites
Adam_42    3629
The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.

You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.

Share this post


Link to post
Share on other sites
Tom KQT    1704
Quote:
Original post by Adam_42
The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.

You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.


Intel's IPP isn't free, as far as I know.

Share this post


Link to post
Share on other sites
feal87    238
Imho the best idea (if you don't want to mess with shaders) is not to encode in real time, but download the raw RGBA data from the surface and save them onto a file and at the end create the jpeg data.
This way it should be quite faster. (but occupy a very big space in the elaboration process (800x600x4 = 2 MB a frame * 24 = 48 MB every second)

Share this post


Link to post
Share on other sites
Michael Tanczos    5681
Right now I'm thinking a best approach would end up being to transfer the data into memory using getrendertargetdata and then hand it off to another thread for compression to allow rendering to continue. I think stalling the render thread to compress an image is a waste.

I tried locking the backbuffer after each present to cause the gpu to flush rendering operations each frame and then batched several getrendertargetdata operations. It seems most of the time spent is waiting for the gpu to finish the flush?? In any case, getrendertargetdata is pretty fast in copying from several surfaces back to back. It seems to lock my fps at 30fps (i'd assume using presentflags.one might be the cause?).

If I use more than one backbuffer though locking one while others are in use will still not cause a stall?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this