Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Nov 2006
Offline Last Active Yesterday, 03:12 AM

#5286809 DX11 Update textures every frame

Posted by on 13 April 2016 - 11:58 PM

My problem is that even the memcpy() of a 1080p RGBA texture into Map()'d memory takes a really long time (5+ms), so when I get up to 4K it's substantial.  What I could really use, I think, is a way to begin this copy process asynchronously.  Right now the copy blocks the GPU thread (since you must Map()/Unmap() on GPU thread, I'm also generally doing my memcpy there).

To be honest, I am more familiar with OGL, so some DX11 expert should have better tips.


For one, once the memory is mapped, you can access it from any other thread, just avoid calling API functions from multiple threads. The basic setup for memory to buffer copy could be:

  1. GPU thread: map buffer A
  2. Worker thread: decode video frame into buffer A
  3. GPU thread: when decoded, unmap buffer A

This will most likely trigger an asynchronously upload from CPU to GPU memory, or might do nothing if the DX11 decides to keep the texture in CPU memory for now (shared mem on HD4600 ?).


The next issue will be, when accessing the buffer. If you access it too early, e.g. by copying the buffer content to the target texture, then the asynchronously upload will be suddently result in synchronosouly stalling your rendering pipeline. So I would test out to use multple buffers, 3 at least. This kind of delay should be not critical for displaying a video.


An other option would be to look for a codex which can be decoded on the GPU. I'm not familiar with video codex, but there might be a codex which allows you to use the GPU to decode it. In this case I could work like this:

  1. map buffer X
  2. copy delta frame (whatever) to buffer (much smaller than full frame)
  3. unmap buffer X
  4. fence X
  5. ..
  6. if(fence X has been reached) start decode shader (buffer->target texture)
  7. swap target texture with rendered texture

#5286421 DX11 Update textures every frame

Posted by on 11 April 2016 - 10:59 PM

Copying textures every frame from CPU to GPU memory will be bottlenecked by the bus-bandwidth, so, check out your target platform (e.g. PCI-E) bandwidth and do some theo-crafting about how many times you would be theoretically able to transfer your textures from CPU memory to GPU memory. If this would be an issue, try to re-think your approach.


Data transfer will use DMA most of the time, so you can hide this transfer costs (aka avoid stalling your pipeline) if you can get along with one or two frames delay. If this is the case, look into double/triple buffering.


Eventually try to reduce the transfered data, either update only parts, use some compression or do even packing/unpacking.


Why are 2048x2048x2048 limiting ? Do you need larger textures ? I mean, 2k^3 ~ 32GB for an RGBA texture without mipmaps.

#5285048 A* A star vs huge levels

Posted by on 04 April 2016 - 11:26 AM

Create a hierarchy of waypiont graphs, you can even map higher level waypoints to certain points of interest (the cave entry, the big rock, the bridge etc.). Get the closest waypoint off your start and end waypoint and start A* on this level. Then go down the level, on each level execute the A* to the next waypoint of the higher level. The benefit of this  approach is, that you dont need the hi-resolution graph in memory, just the highest level is necessary and the rest on demand, and it is more natural. A human would not take the shortest path, he would most likely take the rout from town to town to travel over long distance.

#5284992 parallel precomputations

Posted by on 04 April 2016 - 03:10 AM

1 hour for 1000 nodes, are you executing A* 1000*1000 times ?


Check out the standard dijkstra which will find the shortest path from one node to all nodes. Yes, it will not find always the ways the A* will find, but it could be enough to estimate your rating.


To further optimize it, you can extend your waypoint data to include the dijkstra data necessary (costs and where it comes from) for multiple workers. This way you will be able to run multiple worker threads on the same waypoint graph concurrently (as long as you do not modify the waypoint graph during processing).

#5284965 Races vs. techtree vs. doctrines - choosing from the start or not...

Posted by on 03 April 2016 - 11:13 PM

The start will be usually the same but with good versatility later



No, the whole game will be always played in the same way. The issues with such an approach is, that you will have always a dominating development path and players are really good at detecting this path. In the end it is a rush for the dominating path, the one who is quicker will win. This 'stick-to-the-best-strategy' way of playing a RTS game has been observed in almost every game out there and it will although apply to games which have multiple different factions, but atleast here you need to change your strategy to counter the faction specific advantages.

#5284962 How beneficial can personal projects be?

Posted by on 03 April 2016 - 11:02 PM

As far as applying for jobs, how beneficial is it to have a long list of personal, playable projects? Should I be aiming to make a large collection of projects that demonstrate a wide range of different genres, styles and mechanics? 


Thanks a lot for any help smile.png

Not genres, styles and mechanism are important but showing understanding, knowledge and skill or just passion. I think , that some demos demonstrating skills in game design, visual rendering or AI behavior would be better. E.g. are you able to write a basic deferred rendering engine, a basic client-server, a balanced eco mini-game, some AI entities do some interesting stuff. On top of this you can try to create some over-the-top demos, implementing a more complex and modern rendering approach, solving some hard AI problems etc.


Game mechanism and genres might be useful for game designers only.

#5284424 Object Space Lightning

Posted by on 30 March 2016 - 10:27 PM


Overdraw is actually less of a problem, because the overdraw-ing shader is cheap as chips.


Sorry I meant the blending cost. Rendering a bunch of overlapping blended objects will still be just as slow.


It doesnt really matter compared to your standard particle system (mass-blended surfaces) or the costs for calculating light or some other complex shaders. It is really simply a alpha-blend copy from one buffer to an other, no magic, just the bandwidth for copying. Compare this to the bandwidth costs of all the other effects, like sampling multiple buffers for some g-buffer magic.

#5284177 Should I start with fixed-function-pipeline OpenGL or the newest possible?

Posted by on 29 March 2016 - 11:53 PM

The philosophy behind the old fixed function pipeline and modern APIs is too different from the developers view, that you should skip fix function pipline OpenGL. They are really two different beasts and learning to master a dead-beast will not help you to understand the new way of doing it, it will most likely only confuse you.


I would sugguest to learn the new way only (>OGL4).

#5284170 Unwrapping similar models to same texture

Posted by on 29 March 2016 - 11:09 PM

I've tried this to some degree. Basically it is often easier to unwrap each model separately (unwrapping a model in decend quality is done in 10 mins, you don't need hi-professional unwrapping for your game project most of the time, leave this to AAA budget games).


Reusing parts of the texture is often harder. What works best for me was to work with two uv -sets, bake the source texture to the target texture and use the target texture as paint base. E.g. you have a standard in-game uv-set (just unwrap the model) and a special uv-set which fits an existing texture (you don't need to cover everything, e.g. no need to fit a female face on a male model). Then bake the source texture with the special uv set to the target texture with the standard uv set. This works most of the time, still you need to rework the target texture.

#5284169 trade marked orc design

Posted by on 29 March 2016 - 11:01 PM

Design is copyrighted, so, basically the orc can be copyrighted too (thought it is not really design). But like a car, which design is copyrighted too, you still have 1000th of different car models with 4 wheels and green color. I'm sure that the modern green orc evolved by several artists, each getting inspired by a former artist and the first orc or goblin or whatever will be much older than blizzards orc.


Try to create your own orc by taking common references (bodybuilder for shape, animal references for certain features like eyes, ears, teeth etc.) and avoid taking other art as reference. The latter bears always the danger of copying design.

#5283999 Object Space Lightning

Posted by on 29 March 2016 - 12:14 AM


E.g. lighting semi-transparent objects

I don't get this one. How would you do that?


The difference between OSL and a standard deferred renderer is, that you have a 'g-buffer' per object, so you can calculate the lighting of the surface independently of what is behind or in-front of the object in the scene. So, calculate the lighting in os/ts and forward render the object into the scene using standard alpha blending.

#5282873 create big instance buffer

Posted by on 23 March 2016 - 08:02 AM

Theoretically GPUs are able to process lot of vertices, but you will have some limiting factors. For one memory bandwidth. If the vertex shader is so optimized, that it will be optimized to output such an insane number of vertices per frame, then you will get bandwidth issues. The second fun will be, when I understood you right, that you want to render grass (obviously single blades of grass ;-) ), most likely alpha blended ? This will result in lot of overdraw (no early z-rejection) and you need to sort it when changing the camera angle. Caching your data in 100 buffers sounds like a better approach.

#5282821 create big instance buffer

Posted by on 23 March 2016 - 02:36 AM

Well, the problem should start with memory. You save a 4x4 float matrix per instance , right. So 64 bytes per matrix, you would need ~6.4 GByte for 100 000 000. It would be roughly ~1.1 GByte for 18 000 000, just for holding the data structure. Should it exists in v-ram ? Do you need some more data like texture, screenbuffers etc ?

Even if memory would be no issue, you would need to render 18 000 000 instances, how many faces will one instance have ? A simple cube would result in 12 tris, finally resulting in 216 000 000 tris per frame.

Even if performance would be no issue, you have 216 000 000 tris on a 1080p display with 2 138 400 pixels, which are rouhgly 100 faces per pixel, if you only want to render a simple cube. So, this would be either overkill or the really, really lazy way to render a large world.

I think this has enough potential to introduce some issues :D

What is your goal ?

#5282806 Odds of success as an indie publisher rather than a dev?

Posted by on 23 March 2016 - 12:11 AM

You want to be a publisher, like EA just in small ?

Well, real publisher will carry almost all of the financial risk and will be responsible for the #1 selling point: marketing.

Most games will not break even and there are some super-hits which will hopefully cover the loss of all the other titles. There is a reason why some big publishers died in the last years, even with good titles (eg THQ).

So, the real question is: Why do small teams reallyself-publish ?
Answer A: Hey, they wanna make more money...
Answer B: Eemm.., well, no publisher is interested to bear the risk...

Making money with a title is 99% marketing and a good publisher will be we excellent here. So, if you want to be a publisher and you want to make money, you would need to be excellent in marketing, networking, raising capital...

#5282537 Understanding the difference in how I should be drawing lots of objects

Posted by on 21 March 2016 - 11:52 PM

My previous implementation was based on pure immediate mode too (begin/end). I converted my gui rendering by doing this:
1. Choose a common format for all widgets (e.g. 4 vertices each having 1 tex coord,color etc.), one which can be mapped directly to the VBO.
2. Exchange the begin/end by writing the data to a cache, so instead of glVertex(xx) use something like data.position=xx
3. Caching was really useful for static text which have several hundred quads and changed really seldomly.
4. Map the "back" VBO (aka accessing the VBO memory directly)
5. Copy the displayed widgets from the cache to the "back" VBO each(!) frame (a simple memcpy if both use the same memory setup)
6. Unmap the "back" VBO (aka asynchroniously upload to the GPU memory).
7. Render the "front" VBO which has been updated and upload one frame before, so that the GPU renders one VBO while the other is uploaded (important to avoid stalling!).
8. Swap back/front VBO for the next frame.

This way I dropped my API calls according to gDebugger by 15k calls. Although, immediatemode is depreacted in OGL 3.0 and above.