Fast Direct3d MJPEG compression

Started by
6 comments, last by Michael Tanczos 14 years, 2 months ago
Does anybody know of a fast (fast!) way to encode direct3d surfaces into jpegs? With slimdx and c# I tried texture.tostream and that seemed pretty slow.. This was also pretty slow and I didn't even get to the jpeg compression yet. Getrendertargetdata clocked in at 15ms on my computer:

_device.GetRenderTargetData(_prvsurface, _prvsurface_tmp);

DataRectangle g = _prvsurface_tmp.LockRectangle(LockFlags.None);

System.Drawing.Bitmap output = new Bitmap(_prvsurface.Description.Width, _prvsurface.Description.Height, 4 * _prvsurface.Description.Width, System.Drawing.Imaging.PixelFormat.Format32bppArgb, g.Data.DataPointer);

_prvsurface_tmp.UnlockRectangle();


Adding this brings the clock up to 40ms, so the GDI+ encoder is pretty slow: System.IO.MemoryStream stm = new System.IO.MemoryStream(); output.Save(stm, System.Drawing.Imaging.ImageFormat.Jpeg); Any other ideas for fast(er) jpeg compression of a surface?? - Michael Tanczos
Advertisement
On an unrelated note, it took all of 2 minutes for my post to show up in google. Wow. That's some seriously fast indexing time.
If you rendered the data at the same frame within which you are retrieving it, the GPU needs to flush all rendering operations on that target immediately in order to lock it for you to read. This will break the parallelism between CPU and GPU. 15 ms sounds roughly like 1/60th of a second, which in turn sounds like a common refresh rate.

One way to rectify this is to use several buffers - while you're rendering to one of them, you can lock an another one for which the rendering has already finished. Modern hardware can render from 1 to 4 frames in advance.

In any case, you probably need some kind of work queue approach.

Also, a high-tech solution would be to implement the compression directly in the GPU by using a compute shader, and read the compressed data back to the CPU :) However, this will limit the target audience somewhat, and represents a considerable amount of work.

Niko Suni

I think that the fastest way to encode D3D into MJPEG would be on the GPU. Now, getting that implemented may not be trivial, but it looks like some people have done it. For example, the university of Oslo apparently runs a competition to compress MJPEG as quickly as possible (see here and here). Googling "gpu jpeg compression" also gets some results. Couldn't find any working code, although NVIDIA's site has code for DCT in CUDA.
The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.

You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.
Quote:Original post by Adam_42
The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.

You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.


Intel's IPP isn't free, as far as I know.
Imho the best idea (if you don't want to mess with shaders) is not to encode in real time, but download the raw RGBA data from the surface and save them onto a file and at the end create the jpeg data.
This way it should be quite faster. (but occupy a very big space in the elaboration process (800x600x4 = 2 MB a frame * 24 = 48 MB every second)
Right now I'm thinking a best approach would end up being to transfer the data into memory using getrendertargetdata and then hand it off to another thread for compression to allow rendering to continue. I think stalling the render thread to compress an image is a waste.

I tried locking the backbuffer after each present to cause the gpu to flush rendering operations each frame and then batched several getrendertargetdata operations. It seems most of the time spent is waiting for the gpu to finish the flush?? In any case, getrendertargetdata is pretty fast in copying from several surfaces back to back. It seems to lock my fps at 30fps (i'd assume using presentflags.one might be the cause?).

If I use more than one backbuffer though locking one while others are in use will still not cause a stall?

This topic is closed to new replies.

Advertisement