# maxest

Member

602

627 Good

• Rank

• Interests
Programming
1. ## Video codecs: compressing delta values

Huh, my joy was premature. I made a mistake in my code and the change we're taking about here was not applied at all. After I have ensured it is applied artifacts showed up. I did make use of the trick desrcibed above though. I used this to more efficiently store differences between DC coefficients. I think it is not possible to apply any non-linear operator (like xor or modulo) *before* running DCT/quantization on the data and expect correct results on the decoding side. I might be wrong here but that is my intuition so far.
2. ## Video codecs: compressing delta values

At first I did not understand how this was supposed to work. My though was: hey, I still have [-255, 255] range to cover so how can I skip the extra bit? But your suggestion enlightened me on that - I *don't* have that range. I indeed have negative values to cover but the total number of values I need to represent is 256, not 511. I'll give an example to elaborate a bit more. Let's say we have a pixel which in frame1 has value of 20 (in [0..255] range) and in frame2 the same pixel has a value of 7. Difference to encode here is 7 - 20 = -13. Now, since that 7 has to be only in range [0..255], our limits are: 0 - 20 = -20 255 - 20 = 235 So for that case the only viable differences to encode are in range [-20, 235]. So as you see we still have only 256 values so 8 bits is enough. So back to our example, we got 7 - 20 = -13. We take -13 % 256 = 243 and store that. On the decoding side we have previous, frame1's value of 20. We decode by summing 20 + 243 = 263. Taking 263 % 256 = 7. This solution not only improved my compression so that the whole spectrum of values is encoded but, to my surprise, the compression ratio has increased by a measurable few %. Not sure why that happened but I won't complain :). Thank you rnlf_in_space immensely for making me realize what I have just described :).
3. ## Bloom

Yeah, blooming the whole scene (as in Call of Duty) makes sense but only if you bloom on HDR target, where bright areas have way higher values than dark ones.
4. ## Video codecs: compressing delta values

I'm implementing a video codec. One way of improving efficiency is to compress delta between two frames. The problem with computing difference though is that it increases the range of values. So if my input image has RGB values in [0, 255] range the diff can be range [-255, 255]. My solution to this is to calculate (in floats) clamp(diff, -0.5, 0.5) + 0.5. This gives me range [0, 255] but cuts off higher values which actually is not a problem; at least I don't seee much difference. I was suggested that instead of using "raw" difference between input frames pixels I should use xor; without much further explanation. I seriously doubt if I should xor input RGB values before applying conversion to luma-chroma, DCT and quantization on the result does not yield good results (I see severe artifcats). Anyway, I've tried different approaches and here are my various findings. As a test case I took two similar pictures. 1. Compression of a single picture, no delta, JPEG-like (conversion of RGB to luma-chroma, quantization, DCT and finally Huffman-based RLE), gives compression ratio x26. 2. Computing xor of input frames and then compressing that (JPEG-like) gives compression ratio x39 and afore-mentioned artifacts. 3. Computing clamp(difference, -0.5, 0.5) + 0.5, followed by JPEG-like compression, results in compression ratio x76. 4. Since xor itself seemed to make sense but applied on *DCTs* of the input frames, not the RGBs, I tried that. Storing xor of DCT's and running RLE on that gave me compression ratio of x72. So as you see, I indeed achieved some nice compression using xor but only with 4. 3. gives the best compression ratio and has some other advantages. Since differences are more "natural" than xor nothing stands against blurring difference from 3. and thus achieving even better compression ratio at the cost of decreased, yes noticeable but not that much, quality. Do you have any thoughts on improving delta compression of images? I'm asking because in extreme cases delta compression produces blocks and eventually whole frames which have *worse* compression ratio than when compressed without delta.
5. ## Bloom

I think Styves is right. In Call of Duty they take 4% of the HDR scene's colors and use that to apply bloom on. No need for thresholding. But it's not a problem to apply it.
6. ## Bloom

I would recommend looking here: http://www.iryoku.com/next-generation-post-processing-in-call-of-duty-advanced-warfare The bloom proposed here is very simple to implement and works great. I implemented it and checked. Keep in mind though that there is a mistake in the slides which I pointed out in the comments. Basically, the avoid getting bloom done badly you can't undersample or you will end up with nasty aliasing/ringing. So you take the original image, downsample it once (from 1920x1080 to 960x540) to get second layer, then again, and again, up to n layers. After you have generated, say, 6 layers, you combine them by upscaling the n'th layer to the size of (n-1)'th layer by summing them. The you do the same with the new (n-1) layer and (n-2)'th layer. Up to the full resolution. This is quite fast as the downsample and upsample filters need very small kernels but since you're going down to a very small layer you eventually get a very broad and stable bloom.

Bump
12. ## PCI Express Throughput

Just wanted to let you know that I made a test with CUDA to measure memory transfer rate and it peaked at around ~ 12 GB/s. Also, measuring CopyResource time with D3D11 queries result in very similar throughput.
13. ## Motion vectors calculation in MPEG-like codecs

Hey, Not really sure if the right forum but hey - codecs display graphics so... :). I've been wondering how to calculate motion vectors. When encoding a video offline I can imagine one could spend enough time to search a large neighbourhood to find a motion vector that maps the current frame block to the previous frame block with minimal differences (hence achieve better compression). But what about live broadcasting where time is of value? How would a codec estimate a motion vector? Search small neighbourhood?
14. ## DX11 HLSL race condition when writing to shared memory passed to function

I found a better workaround. So simple I can't imagine how I could had not come up with it before. I just used macro. Still, would be nice if this bug was fixed. In the meantime I will be using macros on functions getting shared buffers as input.
15. ## DX11 Dispatch causes flush?

I would like to run some computation using compute shaders. A lot of computation. Since GPUs have separate memory engine I thought I could make use of it, just like with CUDA streams, and have asynchronous computation and data download GPU -> CPU. So I would do something like this: Dispatch 1 (first half of data) CopyResource 1 Dispatch 2 (second hald of data) CopyResource 2 Now the question is: will CopyResource 1 and Dispatch 2 overlap in time? I heard from someone that Discard causes a flush; it waits until all previous commands have been completed and then gets called but can't find that in MSDN. Can anyone confirm?