Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 03 Jul 2006
Offline Last Active Private

#5236966 GenerateMipmaps and staging textures

Posted by Adam_42 on 26 June 2015 - 01:12 PM

If you really need to generate mip maps at run time for a texture in CPU memory then I'd suggest using a library like https://directxtex.codeplex.com/ that will do all the work for you.


The reason GenerateMipmaps has those requirements is that it does it on the GPU and not on the CPU, and is designed for quick generation of mip maps for render targets that you need to read back as a texture.


For normal textures I'd suggest loading .dds texture files off disc which have mip maps already generated offline. That also makes it easier to create the textures in one of the block compressed formats like BC1 that take less memory and are more efficient to render with, but are relatively slow to compress.

#5216021 Decompressing PNG / JPG images on the GPU

Posted by Adam_42 on 12 March 2015 - 04:16 AM

You shouldn't be bandwidth limited. Uploading uncompressed 60 FPS 1080p video to the GPU should only need about 500MB/s of bandwidth. The PCI express bus is much faster than that (~8 GB/s for 2.0 at x16). I'd try playing with things like double buffering to try and get D3D copying data for one texture while you fill in the next one.


You could also use threads that do CPU-only work and don't touch the D3D device. That is you decode to a block of memory, which the D3D thread then copies to a texture. Again you'd want double buffering here.


If you just want smaller textures, there's a format GPUs are really good at decoding - DXT. It's not as small as a JPEG can be, and it is lossy, but it is designed for GPU use. If you can decode directly to that format you'll save a load of bandwidth (DXT1 is effectively 4 bits per pixel).


Decoding of compressed image data should be possible on most GPUs, but I wouldn't go for either PNG or JPEG. PNG is especially awkward because it just uses zlib compression on the image data. For JPEG it's at least possible to speed it up with CUDA - http://www.fastcompression.com/products/jpeg/cuda-jpeg.htm but DX9 would be a pain.


For video decoding you could look into APIs like http://en.wikipedia.org/wiki/DirectX_Video_Acceleration which may get some hardware acceleration via hardware like http://en.wikipedia.org/wiki/Nvidia_PureVideo

#5215213 Memory leaks when Windows shuts down the display

Posted by Adam_42 on 07 March 2015 - 05:53 PM

If all else fails you can prevent the computer turning off the display with SetThreadExecutionState() https://msdn.microsoft.com/en-us/library/windows/desktop/aa373208%28v=vs.85%29.aspx

#5212287 Max size of Matrix4x4 Array in HLSL

Posted by Adam_42 on 22 February 2015 - 10:16 AM

Yes, you can use it on D3D9 as long as the GPU supports vertex texture fetch.


By the way, the matrices used for skinning are actually 4x3 ones, so you can fit more like 80 of them in if you rearrange how the data is stored.

#5209505 Drawing fullscreen triangle, problem with shader

Posted by Adam_42 on 08 February 2015 - 05:20 PM

You can't draw a quad with 3 points. You need 4 points.


You can when it's a full screen one, because it gets clipped. You simply make the triangle twice the width and height of the screen.


This will give a performance improvement compared to using two triangles.


You can find a detailed low level explanation of the performance boost at http://michaldrobot.com/2014/04/01/gcn-execution-patterns-in-full-screen-passes/ but the essence of it is that pixel shaders process pixels in big groups. The diagonal line across the screen joining the two triangles breaks up some of those groups, and it uses the cache less efficiently.

#5208797 1997 game graphic files

Posted by Adam_42 on 04 February 2015 - 07:49 PM

I got bored and did a bit more analysis. Below is the first part of the data with some highlighting.
The bold parts seem to be an identifier as you already found.
The green and blue numbers appear to be 16-bit width and height, but I'm not 100%.
The red number looks 32-bit, and I'm not sure what it means. Some kind of unique identifier maybe?
The next 4 bytes are 03 00 00 00 for all of them. Maybe some kind of image type?
Note that the numbers are all stored in little endian format (least significant byte first).
Following that we appear to have four more 16-bit values (in grey) of unknown meaning. They could also be part of the pixel data, but they are consistently xx 00 in all three which suggests they could well be 16-bit numbers. Needs more analysis.
Next there's 210 more bytes of unknown data, presumably containing the pixels.
For the second case, the unknown data is 214 bytes.
Since the overhead compared with the possible dimensions isn't consistent that suggests RLE encoding. Note that sprite RLE was generally done for improving rendering performance, and not to reduce data size.
I'm wondering if the image data is 16-bits per pixel, with the top bit always zero (i.e. 555 encoding not 565). That's because I can't see any obvious cases where two bytes next to each other are both 0x80 or bigger outside of the headers.
That's probably more than enough to get you started.

14 00 04 0F 16 00 09 00 6B 00 00 00 03 00 00 00
09 00 09 00 03 00 15 00 04 57 03 00 15 00 61 2E
03 00 13 00 E2 41 62 36 03 00 01 00 03 00 0C 00
A0 26 A0 26 81 26 21 2F 01 2F C1 2E 42 42 E1 31
03 00 02 00 80 15 03 00 04 00 E0 08 20 09 40 11
A0 15 00 1E 00 1A 60 22 A0 26 A1 2A C0 2A C0 2A
80 2A 02 4A 81 2D 80 15 03 00 02 00 03 00 02 00
60 11 00 0D C0 08 E0 0C 00 0D 20 11 80 15 A0 19
20 22 40 22 80 2A A0 2A 60 26 60 2A 22 3E A1 39
E0 21 03 00 03 00 03 00 03 00 00 11 20 0D 00 0D
00 0D 20 11 80 15 A0 19 E0 1D C0 1D 20 22 00 22
A0 29 C1 3D 40 1D A0 1D 03 00 04 00 03 00 05 00
C0 08 A0 08 00 11 40 15 20 11 00 11 40 15 60 25
C1 35 41 2D C0 10 03 00 06 00 03 00 08 00 60 04
60 04 60 04 60 04 03 00 0A 00 14 00 04 0F 15 00
08 00 6D 00 00 00 03 00 00 00 0B 00 0A 00 03 00
14 00 C1 31 03 00 13 00 22 42 03 00 01 00 03 00
06 00 80 15 00 1E 60 22 20 1E C1 2A E1 2A 01 2F
A1 26 E1 2A 01 2F C0 2E E2 39 E2 39 03 00 02 00
20 0D 20 0D 00 0D 20 0D 60 11 40 11 80 15 E0 1D
60 26 40 22 C1 2A 01 2F 01 2F E1 2E C0 2A 80 2A
41 36 E1 39 20 19 03 00 02 00 03 00 01 00 C0 08
00 0D 00 0D 20 11 20 11 60 15 C0 19 00 1E 60 26
80 26 C0 2E E1 2E A0 2A 60 2A A1 3D 61 29 60 1D
03 00 03 00 03 00 02 00 C0 0C E0 0C 40 15 80 15
60 15 A0 19 E0 1D 40 26 40 26 20 22 20 26 C3 42
83 4A A1 31 03 00 05 00 03 00 04 00 C0 0C 40 19
80 19 80 19 60 19 A0 1D E1 2D 01 36 02 42 C1 2D
A0 1D 03 00 06 00 03 00 07 00 E0 20 61 2D A1 2D
E2 31 80 25 03 00 09 00 14 00 04 0F 16 00 08 00
75 00 00 00 03 00 00 00 0B 00 0A 00 03 00 15 00
40 21 03 00 14 00 80 29 03 00 01 00 20 0D 60 11

#5208097 1997 game graphic files

Posted by Adam_42 on 01 February 2015 - 04:54 PM

That make.set file is simply a standard makefile for compiling showset.exe. Unless you have the .c and .asm files that it references it's not much help. There's a good chance that the sprite drawing code was written in assembly.


Back in 1997 DOS based games were getting less common. Windows 95 had been around for long enough that games were getting made using DirectX / DirectDraw.


There is also no standard sprite format. It could be anything. In general you'd expect to find at least the width, height and pixel data. On top of that possibly a palette if it's a 4/8 bit per pixel sprite. 16 bits per pixel is also used, but 24/32 bpp is unlikely back then. There's also a chance that the pixel data is run length encoded (to make it quicker to skip over transparent pixels).


It's easier if you have a screenshot to work backwards from, so you can tell if your guess at the encoding is giving you the right colours. It also helps if you know the bit depth of the screen mode the game is using.


It might be quicker going the other way - overwrite various bits of data with zeros (or other values), then run the game / viewer and see what ends up on screen.

#5207020 Texture Renders Incorrectly in Application but Perfectly fine in VS2013's...

Posted by Adam_42 on 27 January 2015 - 05:46 PM

HRESULT res = this->renderer->GetDeviceContext()->Map(this->texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource);
memcpy(mapped_subresource.pData, this->raw_data, this->image_width*this->image_height*sizeof(Pixel));
this->renderer->GetDeviceContext()->Unmap(this->texture, 0);


This is almost certainly where it's going wrong. You're ignoring the pitch value in the mapped_subresource, and it just happens to work correctly when using the debugger.

#5206254 Memory usage rockets when screen locked

Posted by Adam_42 on 23 January 2015 - 03:01 PM

If you want to find out for certain xperf should be able to get you the call stacks for the allocations. It's not the easiest tool to use though.



#5205001 How to properly switch to fullscreen with DXGI?

Posted by Adam_42 on 17 January 2015 - 08:31 PM

Have you read http://msdn.microsoft.com/en-us/library/windows/desktop/bb205075%28v=vs.85%29.aspx#Care_and_Feeding_of_the_Swap_Chain ?

#5201710 Text rendering and low FPS in game.

Posted by Adam_42 on 04 January 2015 - 04:58 AM

I've never used that simulator, but there's only two possibilities I can think of for the slow frame rate:


1. The simulator limits the frame rate.


2. The simulator is really slow.


Some profiling in the simulator should tell you for certain what's going on. If you can, repeat that profiling in whatever conditions the end user can expect to see.

#5197898 Direct2D kills my CriticalSection...?

Posted by Adam_42 on 12 December 2014 - 06:42 PM

If you put a breakpoint on the TryEnterCriticalSection call you can inspect the state of the critical section in the debugger. You can also look at what the other thread is doing by using the threads window.


The most important values are:


OwningThread - 0x00000000 == none. Otherwise it's the ThreadID of the thread that owns it. Use the Threads window to see which thread that is, and look at what code it's running.

RecursionCount 0 == not locked. Essentially EnterCriticalSection() increments it and LeaveCriticalSection() decrements it.


It's also worth noting that TryEnterCriticalSection() can fail every time due to timing issues. That it it can find that whenever it tries it's locked. The longer the lock is held by other threads the less likely it is to work. In general you should keep them locked for as short a duration as possible. Don't write code that looks like this if you can avoid it:






This version is much better if you don't need the lock inside the process function. Even if Process() does need a lock, can it use a different one?



auto thing = queue.pop_front();



#5197300 sRGB and deferred rendering

Posted by Adam_42 on 09 December 2014 - 07:42 PM

The copy you perform will be done via your own shader which reads your intermediate texture and either returns sqrt( X ) or pow( X, 1.0 / 2.2 ) depending on how approximate you want to be in exchange for performance.


Err, you don't want to do that conversion in the shader at all. For a render target set up as sRGB the conversion will be done automatically when you draw to it, and that should be faster than doing it in the shader.


For deferred shading I'd go with something like this process:


1. Render all the opaque stuff. The diffuse MRT render target needs to have better than 8-bits-per-channel precision because it's in linear space.

2. You want to do lighting into a DXGI_FORMAT_R16G16B16A16_UNORM render target.

3. After that you want to render anything transparent (using forward lighting), and apply any linear space post effects.

4. At some point you will convert to sRGB, which may also involve an HDR tone mapping shader.

5. You can then do more post-processing in sRGB (e.g. I seem to remember that FXAA uses sRGB and not linear inputs).

6. If there's any 2D rendering, you should probably do that last, directly to the back buffer.


The last draw call that does a full screen sRGB render should have the destination render target as the back buffer, to avoid the need for an extra copy step at the end.


Note that you could switch to sRGB just after step 2 if you want to save on rendering to the bigger and slower linear texture. You'll get better precision by staying in linear space for longer though.

#5196340 Weird stuff happening when stepping through a threaded code

Posted by Adam_42 on 04 December 2014 - 05:26 PM

I'm not a C# programmer, and even to me that code is clearly not thread safe.


Mistakes I've noticed in the Update() function:

- You're using a for loop to go through the queue. Since the count can change at any time due to other threads this is just wrong. Don't use .Count at all.

- Calling ToList() is going to be bad, for the same reason. The only way you should be examining the list contents is via TryDequeue().

- You ignore the return value from TryDequeue(). This is bad because if it's false you've not got a valid object to work with - the list is empty.

- The code only sleeps when there's an item in the list. If it's empty the thread eats 100% of one CPU core for no reason. You want the exact opposite of that.

- Looping calling Thread.Sleep() is a bad way to wait for work anyway. You want to use some sort of blocking wait function, which may require a different container.


There's also no sign of any test code. Even if you think you've got threaded code right, it's generally a good idea to do some stress testing, just in case you missed something.


My recommendation would be to avoid using threads until you have more experience. They are difficult to use correctly, even for experienced programmers, and modern CPUs run code fast enough on a single core that you can easily get away without them.

#5195942 Alpha blending

Posted by Adam_42 on 02 December 2014 - 02:32 PM

There's another option that can make alpha blending faster. Enable alpha testing (or use clip() in the shader).


It's most helpful where memory bandwidth is limited as it trades off shader instructions for blending work. For that reason, the effect tends to be most noticeable on lower end hardware.