Optimizing CPU cache use in render loop

Started by
0 comments, last by dblack 12 years, 10 months ago
I've been working on optimizing my render loop to make the best use of the CPU cache.

Right now I'm simply keeping a vector of pointers to each object to be rendered. I then iterate through the vector and call each objects Render() method. Inside the render method, each object updates the World matrix that is passed to the shader, then performs the draw call on its geometry.

I'm thinking this is very cache inefficient, since each rendered object could be located anywhere in memory.

I want to change up the way I do rendering by storing all needed data in a contiguous array.

For example

struct Model{
XMFLOAT4X4world;
ID3D11Buffer* index;
ID3D11Buffer* vertices;
short iIndexCount;



};
Model data[ModelCount];

Then I would loop through data[] and perform the draw call on the data contained at each contiguous location of memory. I would think that this would improve cache usage a lot. The only thing I wonder about is what cache hit the actual draw call would entail. The structure is storing pointers to the index and vertex buffer. Would issuing the draw call cause problems? Would it be better for me to use a command list while looping through the data[] array, and then call the command list after the loop?

Or is there an even better way to setup a render loop? I'm open to suggestions.
Advertisement

I've been working on optimizing my render loop to make the best use of the CPU cache.

Right now I'm simply keeping a vector of pointers to each object to be rendered. I then iterate through the vector and call each objects Render() method. Inside the render method, each object updates the World matrix that is passed to the shader, then performs the draw call on its geometry.

I'm thinking this is very cache inefficient, since each rendered object could be located anywhere in memory.

I want to change up the way I do rendering by storing all needed data in a contiguous array.

For example

struct Model{
XMFLOAT4X4world;
ID3D11Buffer* index;
ID3D11Buffer* vertices;
short iIndexCount;



};
Model data[ModelCount];

Then I would loop through data[] and perform the draw call on the data contained at each contiguous location of memory. I would think that this would improve cache usage a lot. The only thing I wonder about is what cache hit the actual draw call would entail. The structure is storing pointers to the index and vertex buffer. Would issuing the draw call cause problems? Would it be better for me to use a command list while looping through the data[] array, and then call the command list after the loop?

Or is there an even better way to setup a render loop? I'm open to suggestions.


In general its a balancing act. Cache usage is often very important however other issues can also have an equal or larger impact. For example organizing by state, organizing graphics objects so they are accessed efficiently by game logic etc. Also remember to profile with a realistic data set, using little test samples will mostly give very different results.


The only way to find out which is best is some educated guesses, experience and measurements by using profiling code or a standalone profiler(always profile and perhaps try a few things, assumtions are often your enemy when it comes to performance).

However, in this case I would wonder how you are going to know the correct order in advance if you are culling objects(assuming this is possible)? Maybe some periodic sorting helps with cache efficiency of the objects surviving the culling tests, but it isnt the first thing I would look into,

David

This topic is closed to new replies.

Advertisement