# Fast Direct3d MJPEG compression

## Recommended Posts

Does anybody know of a fast (fast!) way to encode direct3d surfaces into jpegs? With slimdx and c# I tried texture.tostream and that seemed pretty slow.. This was also pretty slow and I didn't even get to the jpeg compression yet. Getrendertargetdata clocked in at 15ms on my computer:
_device.GetRenderTargetData(_prvsurface, _prvsurface_tmp);

DataRectangle g = _prvsurface_tmp.LockRectangle(LockFlags.None);

System.Drawing.Bitmap output = new Bitmap(_prvsurface.Description.Width, _prvsurface.Description.Height, 4 * _prvsurface.Description.Width, System.Drawing.Imaging.PixelFormat.Format32bppArgb, g.Data.DataPointer);

_prvsurface_tmp.UnlockRectangle();


Adding this brings the clock up to 40ms, so the GDI+ encoder is pretty slow: System.IO.MemoryStream stm = new System.IO.MemoryStream(); output.Save(stm, System.Drawing.Imaging.ImageFormat.Jpeg); Any other ideas for fast(er) jpeg compression of a surface?? - Michael Tanczos

##### Share on other sites
On an unrelated note, it took all of 2 minutes for my post to show up in google. Wow. That's some seriously fast indexing time.

##### Share on other sites
If you rendered the data at the same frame within which you are retrieving it, the GPU needs to flush all rendering operations on that target immediately in order to lock it for you to read. This will break the parallelism between CPU and GPU. 15 ms sounds roughly like 1/60th of a second, which in turn sounds like a common refresh rate.

One way to rectify this is to use several buffers - while you're rendering to one of them, you can lock an another one for which the rendering has already finished. Modern hardware can render from 1 to 4 frames in advance.

In any case, you probably need some kind of work queue approach.

Also, a high-tech solution would be to implement the compression directly in the GPU by using a compute shader, and read the compressed data back to the CPU :) However, this will limit the target audience somewhat, and represents a considerable amount of work.

##### Share on other sites
I think that the fastest way to encode D3D into MJPEG would be on the GPU. Now, getting that implemented may not be trivial, but it looks like some people have done it. For example, the university of Oslo apparently runs a competition to compress MJPEG as quickly as possible (see here and here). Googling "gpu jpeg compression" also gets some results. Couldn't find any working code, although NVIDIA's site has code for DCT in CUDA.

##### Share on other sites
The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.

You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.

##### Share on other sites
Quote:
 Original post by Adam_42The obvious way to me to speed up the 15ms read + 25ms compress time is to multi thread the compression. Two compression threads should easily keep up with reading one frame every 15ms.You may also find http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-jpeg-sample-and-performance-faqs/ handy - it's Intel's free optimized jpeg code.

Intel's IPP isn't free, as far as I know.

##### Share on other sites
Imho the best idea (if you don't want to mess with shaders) is not to encode in real time, but download the raw RGBA data from the surface and save them onto a file and at the end create the jpeg data.
This way it should be quite faster. (but occupy a very big space in the elaboration process (800x600x4 = 2 MB a frame * 24 = 48 MB every second)

##### Share on other sites
Right now I'm thinking a best approach would end up being to transfer the data into memory using getrendertargetdata and then hand it off to another thread for compression to allow rendering to continue. I think stalling the render thread to compress an image is a waste.

I tried locking the backbuffer after each present to cause the gpu to flush rendering operations each frame and then batched several getrendertargetdata operations. It seems most of the time spent is waiting for the gpu to finish the flush?? In any case, getrendertargetdata is pretty fast in copying from several surfaces back to back. It seems to lock my fps at 30fps (i'd assume using presentflags.one might be the cause?).

If I use more than one backbuffer though locking one while others are in use will still not cause a stall?

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628390
• Total Posts
2982412

• 10
• 9
• 19
• 24
• 11