UpdateTileMappings/CopyTileMappings performance: requesting repro attempts
I don't know anything about D3D12 but I can post my results:
NVIDIA GeForce GTX 1070
UpdateTileMappings Tests, CPU work done before GPU begins
ContiguousToContiguous
AVE: 0.89ms, STDEV: 0.65ms, MED: 0.64ms, MIN: 0.29ms, MAX: 2.17ms, GPUAVE: 0.00195ms
ContiguousToRandom
AVE: 263.96ms, STDEV: 189.51ms, MED: 167.02ms, MIN: 52.79ms, MAX: 574.43ms, GPUAVE: 354.12623ms
RandomToContiguous
AVE: 417.53ms, STDEV: 99.70ms, MED: 397.20ms, MIN: 190.79ms, MAX: 564.97ms, GPUAVE: 494.95818ms
RandomToRandom
AVE: 480.93ms, STDEV: 60.88ms, MED: 481.46ms, MIN: 384.92ms, MAX: 586.06ms, GPUAVE: 487.36492ms
RandomToReversed
AVE: 470.40ms, STDEV: 109.00ms, MED: 484.36ms, MIN: 252.67ms, MAX: 695.85ms, GPUAVE: 483.33957ms
ContiguousToReversed
AVE: 1.09ms, STDEV: 0.95ms, MED: 0.89ms, MIN: 0.31ms, MAX: 3.21ms, GPUAVE: 0.77855ms
ReversedToContiguous
AVE: 212.22ms, STDEV: 198.78ms, MED: 123.66ms, MIN: 3.68ms, MAX: 504.59ms, GPUAVE: 312.44186ms
ReversedToReversed
AVE: 260.22ms, STDEV: 224.82ms, MED: 147.88ms, MIN: 19.11ms, MAX: 656.05ms, GPUAVE: 269.86404ms
UpdateTileMappings Tests, CPU work in parallel with GPU
RandomToRandom
AVE: 472.04ms, STDEV: 89.72ms, MED: 492.83ms, MIN: 292.52ms, MAX: 656.10ms, GPUAVE: 924.84854ms
ReversedToReversed
AVE: 298.65ms, STDEV: 211.92ms, MED: 187.93ms, MIN: 34.59ms, MAX: 620.90ms, GPUAVE: 597.90684ms
CopyTileMappings Tests, CPU work done before GPU begins
ScrollRight
AVE: 0.09ms, STDEV: 0.00ms, MED: 0.09ms, MIN: 0.09ms, MAX: 0.10ms, GPUAVE: 0.00215ms
ScrollLeft
AVE: 794.01ms, STDEV: 340.08ms, MED: 844.93ms, MIN: 304.12ms, MAX: 1227.20ms, GPUAVE: 707.55676ms
ScrollUp
AVE: 0.09ms, STDEV: 0.01ms, MED: 0.09ms, MIN: 0.08ms, MAX: 0.12ms, GPUAVE: 0.00143ms
ScrollDown
AVE: 83.18ms, STDEV: 50.67ms, MED: 75.71ms, MIN: 0.16ms, MAX: 169.87ms, GPUAVE: 0.49807ms
RotateSubresourcesUp
AVE: 25.32ms, STDEV: 7.61ms, MED: 22.45ms, MIN: 18.56ms, MAX: 43.99ms, GPUAVE: 0.36721ms
RotateSubresourcesDown
AVE: 24.92ms, STDEV: 10.27ms, MED: 20.19ms, MIN: 15.74ms, MAX: 47.03ms, GPUAVE: 0.09994ms
LeftOntoRightNoOverlap
AVE: 242.73ms, STDEV: 240.60ms, MED: 196.77ms, MIN: 13.97ms, MAX: 790.73ms, GPUAVE: 647.08342ms
RightOntoLeftNoOverlap
AVE: 0.08ms, STDEV: 0.01ms, MED: 0.08ms, MIN: 0.07ms, MAX: 0.11ms, GPUAVE: 0.00266ms
TopOntoBottomNoOverlap
AVE: 13.79ms, STDEV: 8.30ms, MED: 14.72ms, MIN: 0.21ms, MAX: 33.46ms, GPUAVE: 35.26205ms
BottomOntoTopNoOverlap
AVE: 0.06ms, STDEV: 0.00ms, MED: 0.06ms, MIN: 0.06ms, MAX: 0.07ms, GPUAVE: 0.08673ms
CopyTileMappings Tests, CPU work in parallel with GPU
ScrollLeft
AVE: 924.48ms, STDEV: 484.26ms, MED: 754.36ms, MIN: 278.25ms, MAX: 1618.37ms, GPUAVE: 1614.76035ms
ScrollUp
AVE: 0.13ms, STDEV: 0.02ms, MED: 0.13ms, MIN: 0.10ms, MAX: 0.15ms, GPUAVE: 0.13926ms
Runing Win10 Pro on an i3-6100 @ 3.7 Ghz with 16GB of ram. GPU is ASUS ROG Strix 1070 OC with latest drivers.
I'd be happy to do more tests if you like.
I would say you have wonderful timing ( mine on a Titan X Maxwell are similar) :) Bellow the truncated result on an AMD R290, i forfeit while waiting for ScrollRight...
I know for sure on PS4 that updating the mapping has a cost, we kept it in an async thread and limit the amount too ( could the device let you do that with DX12 is another story ). But we use it only for texture and mesh streaming that are likely to be bound by disk bandwidth before hitting a real issue with the mapping update cost.
The same to my dx12 testing, i only use reserved resources for texture streaming that are unlikely to overload the driver as we wait on disk read :)
Looking at the AMD result, i would say they have some nasty linear traversal on pages for O(N^x) cost somewhere and need to be informed of your little test to optimize that code path :)
EDIT: After a few hours, the scrollright test is still pending, so either you have a bug, or they have a pretty bad bug and i would bet the latter :)
AMD Radeon R9 200 Series
UpdateTileMappings Tests, CPU work done before GPU beginsContiguousToContiguousAVE: 1868.38ms, STDEV: 70.42ms, MED: 1898.18ms, MIN: 1668.33ms, MAX: 1919.16ms, GPUAVE: 171.67925msContiguousToRandomAVE: 1963.90ms, STDEV: 18.56ms, MED: 1967.91ms, MIN: 1934.77ms, MAX: 1992.08ms, GPUAVE: 175.63353msRandomToContiguousAVE: 2483.98ms, STDEV: 16.79ms, MED: 2487.84ms, MIN: 2458.53ms, MAX: 2512.04ms, GPUAVE: 193.19948msRandomToRandomAVE: 2517.23ms, STDEV: 30.70ms, MED: 2521.27ms, MIN: 2469.73ms, MAX: 2564.90ms, GPUAVE: 194.73347msRandomToReversedAVE: 2476.44ms, STDEV: 20.99ms, MED: 2484.07ms, MIN: 2432.74ms, MAX: 2496.32ms, GPUAVE: 191.26313msContiguousToReversedAVE: 1979.40ms, STDEV: 13.71ms, MED: 1984.02ms, MIN: 1951.12ms, MAX: 2003.35ms, GPUAVE: 178.67057msReversedToContiguousAVE: 1966.33ms, STDEV: 21.64ms, MED: 1965.59ms, MIN: 1932.56ms, MAX: 2005.68ms, GPUAVE: 133.08271msReversedToReversedAVE: 1897.24ms, STDEV: 17.78ms, MED: 1897.62ms, MIN: 1867.38ms, MAX: 1923.20ms, GPUAVE: 134.01424msUpdateTileMappings Tests, CPU work in parallel with GPURandomToRandomAVE: 2506.74ms, STDEV: 13.67ms, MED: 2509.98ms, MIN: 2480.67ms, MAX: 2524.92ms, GPUAVE: 2500.31717msReversedToReversedAVE: 1874.90ms, STDEV: 6.60ms, MED: 1876.66ms, MIN: 1857.80ms, MAX: 1885.40ms, GPUAVE: 1871.40139msCopyTileMappings Tests, CPU work done before GPU beginsScrollRight