Well he fact that the game logic runs at 20Hz and on a 2.8GHz core still fails, something needs to shift, and I would really like to try and find solutions other than reduce the view distance.
e.g. the following bit of code is from that "Rebuild the CPU side vertex and index buffers" (which once I have I can render at about 250fps on my system, but I need to get the actual data).
bool SimpleBlockRenderer::notSolid(WorldChunk &chunk, int x, int y, int z)
{
//Gets a blockid from the chunk if possible (correct range)
//else falls back to World::getBlockId that looksup the chunk
//via hashmap
BlockId id = chunk.getWorldBlockId(x, y, z);
const Block *block = getBlock(id);
return !block->opaqueCube;
}
void SimpleBlockRenderer::cache(WorldChunk &chunk, ChunkRenderer &chunkRenderer, const Block *block, BlockId bid, int x, int y, int z)
{
//Caches each of the 6 faces of a standard cube
//At present this is the majority of the world
//Possibility to smooth out things like grass which would obvious be more
//demanding than this
//The cache functions here are just putting stuff into a vertex or index
//buffer. A small mutex protected section then hands the updated buffer to
//the render thread, which makes or updates the ID3D11Buffer objects.
//east west
if (notSolid(chunk, x+1,y,z))
cacheEast(chunk, chunkRenderer, block, bid, x, y, z);
if (notSolid(chunk, x-1,y,z))
cacheWest(chunk, chunkRenderer, block, bid, x, y, z);
//top bottom
if (notSolid(chunk, x,y+1,z))
cacheTop(chunk, chunkRenderer, block, bid, x, y, z);
if (notSolid(chunk, x,y-1,z))
cacheBottom(chunk, chunkRenderer, block, bid, x, y, z);
//north south
if (notSolid(chunk, x,y,z+1))
cacheNorth(chunk, chunkRenderer, block, bid, x, y, z);
if (notSolid(chunk, x,y,z-1))
cacheSouth(chunk, chunkRenderer, block, bid, x, y, z);
}
Running on a 2.8GHz Hex core, so around 16 is the entire core
Stack, % of CPU time
|- SimpleBlockRenderer::cache 7.86
| |- SimpleBlockRenderer::notSolid 5.73
| | |- SimpleBlockRenderer::notSolid<itself> 3.86
| | |- WorldChunk::getBlockId 1.28
| | |- World::getBlockId 0.60
| | |- ntkrnlmp.exe!KiInterruptDispatchNoLock 0.00
| | |- ntkrnlmp.exe!KiDpcInterrupt 0.00
| |- SimpleBlockRenderer::cache<itself> 1.02
| |- SimpleBlockRenderer::cacheTop 0.33
| |- SimpleBlockRenderer::cacheEast 0.17
| |- SimpleBlockRenderer::cacheWest 0.16
| |- SimpleBlockRenderer::cacheNorth 0.16
| |- SimpleBlockRenderer::cacheBottom 0.15
| |- SimpleBlockRenderer::cacheSouth 0.14
| |- ntkrnlmp.exe!KiInterruptDispatchNoLock 0.00
To run that over a 32x32x300 (approx height) region takes about 20ms with that code. With all the optimisations I could think of its about 16ms, so still not going to work if I want to do multiple per frame, and I do not think duplicating the entire world state (perhaps even to the GPU) is practical e.g. a 20x20 loaded region is 400 chunks, which is about 500MB of data (for id's and lightmap).
Now doing multiple chunks at once has no dependencies on each other, but the world data can not be changed in the meantime, so given say 3 chunks on average need to be recreated 16ms vs 48ms is why I want to explore the best threading options.
A similar thing stands for general updates, if I make a rule that nothing may access or modify something directly more than 32 units away, then each update of a 32x32 region has no locks provided there is a 64 unit border between regions being updated.
@HappyCoder
The above is one such bottleneck, as well as the CPU sample profiles like above, I do have this with was done quickly with QueryPerformanceCounter (to identify issues with some specific update steps that were over 200ms which my interpolation rendering didn't really like). So apart from optimising the last few ms out of some things, or reducing the data set, threads seem a good idea.
Logic update took to long. time=60.01636 target=50
//This needs rewriting anyway, since was not updated to correctly
//handle chunk borders. However I suspect this task will allways
//be fairly expensive
Lighting: 11.92671
//Checks which currently loaded chunks are needed, gets rid of unneeded ones
//and loads/creates new ones
//also checks if any background loaders have completed
Chunk Loading: 6.72902
//Loads or creates new chunks. If the chunk allready exists will load it
//with a background thread (decompressing the files takes a fair bit of
//CPU).
//Generation of new chunks is inthread limited to 1 per update step
Ensure Loaded: 4.63302
//Deletes unneeded chunks from memory
Unload: 0.03959
Saving: 3.19523
//This could be improved, since it does some disk access on the logic thread
World 0.03459
//Same
Player 0.02342
//The logic thread just creates a std::vector<uint8_t> for these, and gives
//the vector to another thread that compresses it with zlib and writes it
Chunks 2.93431
//Updates all entities, scheduled block updates and random block updates
//for all active chunks
Chunk Update: 6.42098
//Creates NPC's, plants trees, etc. Didnt run on this frame
Populate: 0
//Logic simply to provide the renderer with data
//e.g. new vertex buffer contents
//Entities are not included here since I used a tripple buffer lockless
//scheme at the end of the entities own updates (so included above)
//At the cost of artifacts, restricted to 2 chunks per update step
Render Update: 31.74442
Chunk Cache: 31.45341