Quote:Original post by JorenJoestar
Quote:Original post by kenpex
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.
This is ok, I don't want to...but I do want to let different threads to add commands to a (maybe) per-thread queue, that is then merged and sorted!
At what level of granularity are you creating commands?
Say a very basic design that contains an abstract renderer and some implementations (DirectX9, OpenGL...), do you provide atomic operations that became our commands?
Quote:Original post by kenpex
If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.
This is the exacy implementation made by gamebryo (and full of sourcecode, examples and paper), but this is something I don't like very much.
API abstraction is essential for me, I do want to create something that can handle different APIs!
Quote:Original post by kenpex
Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.
This is a REALLY GOOD point.
The source posted is to only give the idea, but do you suggest something to pay attention to cache misses and branch misprediction?
I feel that using function pointers to speed up the command execution (command id = index in an array of funcion pointers...) could be a nuclear bomb for branches.
What are your thoughts about code execution? Would you like to explain further your thoughs about your design?
Thanks!
Eh, I can't really say much because it depends on your application. You have multiple choices, and no "right way" there are always trade-offs. In general it's not true that using command buffers to record native commands is API dependent, you could simply abstract the recording API, and issue the native commands from a device abstraction layer that you probably already have. So even if you're recording native stuff, still you can be platform indep.
To me the major drawback of using command buffers directly is that you can't sort them. So I prefer to have another layer, record some drawing primitives information, then sort them, then in parallel generate command buffers, then play them.
I've already explained in my blog one scheme to do that. From what I can see, it's good, the only drawback that I see in it is that you are basically working with handles all the times, so you pay that on the cache when going from the hashes to the resources...
An alternative to avoid that is to record abstracted commands that embed pointers to the native resources. In my engine test, a draw command is a short bit string made of handles, that is both the command and the sorting key for it.
I.e a command is, for example
Framebuffer handle...Texture handle...mesh handle
An alternative is to record commands/pointers + a sort key for all of them. That takes more space, but avoids the indirection. To do the same draw, you'll record something like
settexture...pointer + sortkey
setmesh...pointer + sortkey
If the sort is stable, then you can rearrange your recorded abstracted commands (something you can't do with the native ones) and not pay any cache hits. The downside is that your record buffer can be longer (more hits!), and the whole thing is less abstracted (that could be good!).
Notice that in this scheme, all the sortkeys can be stored in a separate array, as they're only used in the sorting pass, that makes sense. Also you could still cull redundant commands when recording, thus making sure your recorded stuff is not too big. Deriving the right sortkeys can be a bit of a problem though.