Jump to content

  • Log In with Google      Sign In   
  • Create Account

Adam Miles

Member Since 09 Jul 2013
Offline Last Active Yesterday, 07:55 PM

Posts I've Made

In Topic: Root Signature Descriptor Ranges & Registers

Yesterday, 06:57 PM

Yes, that's correct.

In Topic: Tiled Resource? Array of Texture3D? Efficient TSDF Volume with less memory fo...

Yesterday, 02:12 PM

The tile size is the same across all the tiers (64KB). The shape of the tile is defined too.


I've some experience using huge 2DArray and 3D textures on my ray tracing Minecraft project and can confirm that the Tiled Resources API is useful in substantially reducing the memory footprint of these large textures. That said, you need to be able to find coherent regions of the texture that will all be able to map to the same tile. "Free space" (air, in the case of Minecraft) can be not backed by any physical memory on some GPUs and to a dummy 'zero' tile on others.


You weren't quite clear on what the bit depth of your texture is, but you're looking at regions of around 32x32x32 for a 16 bit texture.If you can find lots of 32x32x32 regions that are all the same, then by all means use Volume Tiled Resources. Be aware though that you will not find support for this on any AMD GPU that currently exists and only on NVIDIA Maxwell GM2xx and above. I think Intel's support started at Skylake.

In Topic: How to understand GPU profiler data and use it to trace down suspicious abnor...

22 October 2016 - 12:31 PM

Initially I ignored PCI-E bandwidth because I had it in my head that his mobile/notebook GPU was not actually doing a System->VRAM copy. As it turns out though the 680m is a PCI-E 3.0 card 16x lane card, so should be a 16GB/s bus, exceeding the bandwidth of his system memory by a few GB/s.

In Topic: How to understand GPU profiler data and use it to trace down suspicious abnor...

19 October 2016 - 06:29 PM

Buffer to Buffer copy is probably the best test for measuring bandwidth, but it doesn't represent the best layout for accessing data that is logically two or three dimensional, so stick to textures for that.


There really isn't a good one-estimate-fits-all-hardware approach to knowing how long things should take. The difference alone between 720p and 1080p is 2.25x and the hardware disparity between a reasonable mobile GPU and a high end discrete GPU is > 10x. So depending on whether you're talking about a Titan X rendering at 720p or a mobile GPU rendering at 1080p you could be talking about a 20x difference in GPU time.


I have a pretty good idea for how long typical tasks should take on Xbox One at common resolutions (720p, 900, 1080p), but that just comes from years of looking at PIX captures of AAA titles from the best developers. If you asked me how long X should take on hardware Y at resolution Z I'd probably start with the numbers I know from Xbox One, divide them by however much faster I think the hardware is and then multiply up by the increased pixel count.


It doesn't hurt to try figure out how close you might be coming to various hardware limits a GPU might have; just to see if you're approaching any of those. Metrics like Vertices/second, fill-rate, texture fetches, bandwidth, TFLOPS etc are all readily available for AMD/NVIDIA cards. The only tricky one to work out is how many floating-point operations your shader might cost per pixel/thread as you don't always have access to the raw GPU instructions (at least on NVIDIA cards you don't), you can approximate it from the DXBC though.

In Topic: How to understand GPU profiler data and use it to trace down suspicious abnor...

19 October 2016 - 03:47 PM

You may be right about the 8MB write if your mobile GPU has dedicated VRAM. I'm used to thinking of mobile GPUs as sharing system memory with the CPU, in which case you would have to count both.


I was using this page for my DDR3-800 bandwidth numbers. I took the triple channel memory number, divided by 3 and multiplied by 2 to get back to Dual Channel.


If your GPU definitely has dedicated VRAM and it's writing into it then perhaps you're at only around 50% of the theoretical mark you might expect. That too might not be that surprising given that the tiling mode (i.e. 'layout') of the source memory is surely linear. GPUs will often only be able to hit peak throughput on things like texture fetching / bandwidth when the memory access is swizzled/tiled in such a way that it hits all memory banks exactly as intended.


Have you tried doing a raw Buffer to Buffer copy between your UPLOAD heap and the DEFAULT heap just to test how long it takes?