Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Nov 2006
Offline Last Active Yesterday, 02:26 AM

Posts I've Made

In Topic: DX11 Update textures every frame

13 April 2016 - 11:58 PM

My problem is that even the memcpy() of a 1080p RGBA texture into Map()'d memory takes a really long time (5+ms), so when I get up to 4K it's substantial.  What I could really use, I think, is a way to begin this copy process asynchronously.  Right now the copy blocks the GPU thread (since you must Map()/Unmap() on GPU thread, I'm also generally doing my memcpy there).

To be honest, I am more familiar with OGL, so some DX11 expert should have better tips.


For one, once the memory is mapped, you can access it from any other thread, just avoid calling API functions from multiple threads. The basic setup for memory to buffer copy could be:

  1. GPU thread: map buffer A
  2. Worker thread: decode video frame into buffer A
  3. GPU thread: when decoded, unmap buffer A

This will most likely trigger an asynchronously upload from CPU to GPU memory, or might do nothing if the DX11 decides to keep the texture in CPU memory for now (shared mem on HD4600 ?).


The next issue will be, when accessing the buffer. If you access it too early, e.g. by copying the buffer content to the target texture, then the asynchronously upload will be suddently result in synchronosouly stalling your rendering pipeline. So I would test out to use multple buffers, 3 at least. This kind of delay should be not critical for displaying a video.


An other option would be to look for a codex which can be decoded on the GPU. I'm not familiar with video codex, but there might be a codex which allows you to use the GPU to decode it. In this case I could work like this:

  1. map buffer X
  2. copy delta frame (whatever) to buffer (much smaller than full frame)
  3. unmap buffer X
  4. fence X
  5. ..
  6. if(fence X has been reached) start decode shader (buffer->target texture)
  7. swap target texture with rendered texture

In Topic: DX11 Update textures every frame

12 April 2016 - 01:52 AM

- in default & staging case, unmap doesn't take time, but CopyResource is where the time is

Unmap will only trigger the upload, which, when done with DMA, will not involve the GPU. But CopyResource, when you try to access the memory block, will spent time in 

1. waiting until the data has been uploaded (=>stalling your pipeline)

2. actually copying your data


To measure the first delay try to use some fence and try to measure the time spend in waiting for the fence:

unmap buffer A -> fence A ->... -> start GPU timer -> wait for fence B -> end GPU timer -> CopyResource ->... ->  unmap buffer B -> fence B ->...

In Topic: DX11 Update textures every frame

11 April 2016 - 10:59 PM

Copying textures every frame from CPU to GPU memory will be bottlenecked by the bus-bandwidth, so, check out your target platform (e.g. PCI-E) bandwidth and do some theo-crafting about how many times you would be theoretically able to transfer your textures from CPU memory to GPU memory. If this would be an issue, try to re-think your approach.


Data transfer will use DMA most of the time, so you can hide this transfer costs (aka avoid stalling your pipeline) if you can get along with one or two frames delay. If this is the case, look into double/triple buffering.


Eventually try to reduce the transfered data, either update only parts, use some compression or do even packing/unpacking.


Why are 2048x2048x2048 limiting ? Do you need larger textures ? I mean, 2k^3 ~ 32GB for an RGBA texture without mipmaps.

In Topic: A* A star vs huge levels

04 April 2016 - 11:26 AM

Create a hierarchy of waypiont graphs, you can even map higher level waypoints to certain points of interest (the cave entry, the big rock, the bridge etc.). Get the closest waypoint off your start and end waypoint and start A* on this level. Then go down the level, on each level execute the A* to the next waypoint of the higher level. The benefit of this  approach is, that you dont need the hi-resolution graph in memory, just the highest level is necessary and the rest on demand, and it is more natural. A human would not take the shortest path, he would most likely take the rout from town to town to travel over long distance.

In Topic: parallel precomputations

04 April 2016 - 03:10 AM

1 hour for 1000 nodes, are you executing A* 1000*1000 times ?


Check out the standard dijkstra which will find the shortest path from one node to all nodes. Yes, it will not find always the ways the A* will find, but it could be enough to estimate your rating.


To further optimize it, you can extend your waypoint data to include the dijkstra data necessary (costs and where it comes from) for multiple workers. This way you will be able to run multiple worker threads on the same waypoint graph concurrently (as long as you do not modify the waypoint graph during processing).