UpdateTileMappings/CopyTileMappings performance: requesting repro attempts

Started by
3 comments, last by Norbo 7 years, 3 months ago
I'm running into some pretty nasty performance with UpdateTileMappings and CopyTileMappings, and I'm not yet sure if I'm horribly messing something up or if it's caused by the driver/runtime. I don't have any AMD or Intel hardware compatible with D3D12, so I'm having trouble narrowing down the source. If you have a D3D12 compatible machine and you're willing, please download the attached test application and report your results.
[sharedmedia=core:attachments:34375]
[sharedmedia=core:attachments:34374]
The tests take about a minute to complete on my machine. Careful; it may massively slow down any graphical applications you're running, and the OS UI might be sluggish. And while I haven't seen it happen, it's probably a good idea to take precautions under the assumption that it'll freeze your computer or something. Note that the source requires VS2017 to build.
So far, across all the nvidia cards I've tested, some tests finish near instantly, while others take over a second for a single call to UpdateTileMappings/CopyTileMappings.
I appreciate any repro attempts, and I would also appreciate it if anyone would check the source for any obvious silly mistakes.
And happy new year!
Advertisement

I don't know anything about D3D12 but I can post my results:


NVIDIA GeForce GTX 1070

UpdateTileMappings Tests, CPU work done before GPU begins
ContiguousToContiguous
AVE: 0.89ms, STDEV: 0.65ms, MED: 0.64ms, MIN: 0.29ms, MAX: 2.17ms, GPUAVE: 0.00195ms
ContiguousToRandom
AVE: 263.96ms, STDEV: 189.51ms, MED: 167.02ms, MIN: 52.79ms, MAX: 574.43ms, GPUAVE: 354.12623ms
RandomToContiguous
AVE: 417.53ms, STDEV: 99.70ms, MED: 397.20ms, MIN: 190.79ms, MAX: 564.97ms, GPUAVE: 494.95818ms
RandomToRandom
AVE: 480.93ms, STDEV: 60.88ms, MED: 481.46ms, MIN: 384.92ms, MAX: 586.06ms, GPUAVE: 487.36492ms
RandomToReversed
AVE: 470.40ms, STDEV: 109.00ms, MED: 484.36ms, MIN: 252.67ms, MAX: 695.85ms, GPUAVE: 483.33957ms
ContiguousToReversed
AVE: 1.09ms, STDEV: 0.95ms, MED: 0.89ms, MIN: 0.31ms, MAX: 3.21ms, GPUAVE: 0.77855ms
ReversedToContiguous
AVE: 212.22ms, STDEV: 198.78ms, MED: 123.66ms, MIN: 3.68ms, MAX: 504.59ms, GPUAVE: 312.44186ms
ReversedToReversed
AVE: 260.22ms, STDEV: 224.82ms, MED: 147.88ms, MIN: 19.11ms, MAX: 656.05ms, GPUAVE: 269.86404ms

UpdateTileMappings Tests, CPU work in parallel with GPU
RandomToRandom
AVE: 472.04ms, STDEV: 89.72ms, MED: 492.83ms, MIN: 292.52ms, MAX: 656.10ms, GPUAVE: 924.84854ms
ReversedToReversed
AVE: 298.65ms, STDEV: 211.92ms, MED: 187.93ms, MIN: 34.59ms, MAX: 620.90ms, GPUAVE: 597.90684ms

CopyTileMappings Tests, CPU work done before GPU begins
ScrollRight
AVE: 0.09ms, STDEV: 0.00ms, MED: 0.09ms, MIN: 0.09ms, MAX: 0.10ms, GPUAVE: 0.00215ms
ScrollLeft
AVE: 794.01ms, STDEV: 340.08ms, MED: 844.93ms, MIN: 304.12ms, MAX: 1227.20ms, GPUAVE: 707.55676ms
ScrollUp
AVE: 0.09ms, STDEV: 0.01ms, MED: 0.09ms, MIN: 0.08ms, MAX: 0.12ms, GPUAVE: 0.00143ms
ScrollDown
AVE: 83.18ms, STDEV: 50.67ms, MED: 75.71ms, MIN: 0.16ms, MAX: 169.87ms, GPUAVE: 0.49807ms
RotateSubresourcesUp
AVE: 25.32ms, STDEV: 7.61ms, MED: 22.45ms, MIN: 18.56ms, MAX: 43.99ms, GPUAVE: 0.36721ms
RotateSubresourcesDown
AVE: 24.92ms, STDEV: 10.27ms, MED: 20.19ms, MIN: 15.74ms, MAX: 47.03ms, GPUAVE: 0.09994ms
LeftOntoRightNoOverlap
AVE: 242.73ms, STDEV: 240.60ms, MED: 196.77ms, MIN: 13.97ms, MAX: 790.73ms, GPUAVE: 647.08342ms
RightOntoLeftNoOverlap
AVE: 0.08ms, STDEV: 0.01ms, MED: 0.08ms, MIN: 0.07ms, MAX: 0.11ms, GPUAVE: 0.00266ms
TopOntoBottomNoOverlap
AVE: 13.79ms, STDEV: 8.30ms, MED: 14.72ms, MIN: 0.21ms, MAX: 33.46ms, GPUAVE: 35.26205ms
BottomOntoTopNoOverlap
AVE: 0.06ms, STDEV: 0.00ms, MED: 0.06ms, MIN: 0.06ms, MAX: 0.07ms, GPUAVE: 0.08673ms

CopyTileMappings Tests, CPU work in parallel with GPU
ScrollLeft
AVE: 924.48ms, STDEV: 484.26ms, MED: 754.36ms, MIN: 278.25ms, MAX: 1618.37ms, GPUAVE: 1614.76035ms
ScrollUp
AVE: 0.13ms, STDEV: 0.02ms, MED: 0.13ms, MIN: 0.10ms, MAX: 0.15ms, GPUAVE: 0.13926ms

Runing Win10 Pro on an i3-6100 @ 3.7 Ghz with 16GB of ram. GPU is ASUS ROG Strix 1070 OC with latest drivers.

I'd be happy to do more tests if you like.

I would say you have wonderful timing ( mine on a Titan X Maxwell are similar) :) Bellow the truncated result on an AMD R290, i forfeit while waiting for ScrollRight...

I know for sure on PS4 that updating the mapping has a cost, we kept it in an async thread and limit the amount too ( could the device let you do that with DX12 is another story ). But we use it only for texture and mesh streaming that are likely to be bound by disk bandwidth before hitting a real issue with the mapping update cost.

The same to my dx12 testing, i only use reserved resources for texture streaming that are unlikely to overload the driver as we wait on disk read :)

Looking at the AMD result, i would say they have some nasty linear traversal on pages for O(N^x) cost somewhere and need to be informed of your little test to optimize that code path :)

EDIT: After a few hours, the scrollright test is still pending, so either you have a bug, or they have a pretty bad bug and i would bet the latter :)

AMD Radeon R9 200 Series

UpdateTileMappings Tests, CPU work done before GPU begins
ContiguousToContiguous
AVE: 1868.38ms, STDEV: 70.42ms, MED: 1898.18ms, MIN: 1668.33ms, MAX: 1919.16ms, GPUAVE: 171.67925ms
ContiguousToRandom
AVE: 1963.90ms, STDEV: 18.56ms, MED: 1967.91ms, MIN: 1934.77ms, MAX: 1992.08ms, GPUAVE: 175.63353ms
RandomToContiguous
AVE: 2483.98ms, STDEV: 16.79ms, MED: 2487.84ms, MIN: 2458.53ms, MAX: 2512.04ms, GPUAVE: 193.19948ms
RandomToRandom
AVE: 2517.23ms, STDEV: 30.70ms, MED: 2521.27ms, MIN: 2469.73ms, MAX: 2564.90ms, GPUAVE: 194.73347ms
RandomToReversed
AVE: 2476.44ms, STDEV: 20.99ms, MED: 2484.07ms, MIN: 2432.74ms, MAX: 2496.32ms, GPUAVE: 191.26313ms
ContiguousToReversed
AVE: 1979.40ms, STDEV: 13.71ms, MED: 1984.02ms, MIN: 1951.12ms, MAX: 2003.35ms, GPUAVE: 178.67057ms
ReversedToContiguous
AVE: 1966.33ms, STDEV: 21.64ms, MED: 1965.59ms, MIN: 1932.56ms, MAX: 2005.68ms, GPUAVE: 133.08271ms
ReversedToReversed
AVE: 1897.24ms, STDEV: 17.78ms, MED: 1897.62ms, MIN: 1867.38ms, MAX: 1923.20ms, GPUAVE: 134.01424ms
UpdateTileMappings Tests, CPU work in parallel with GPU
RandomToRandom
AVE: 2506.74ms, STDEV: 13.67ms, MED: 2509.98ms, MIN: 2480.67ms, MAX: 2524.92ms, GPUAVE: 2500.31717ms
ReversedToReversed
AVE: 1874.90ms, STDEV: 6.60ms, MED: 1876.66ms, MIN: 1857.80ms, MAX: 1885.40ms, GPUAVE: 1871.40139ms
CopyTileMappings Tests, CPU work done before GPU begins
ScrollRight

Yikes, I was afraid of that. Thanks for the tests!

This topic is closed to new replies.

Advertisement