Back to General and Gameplay Programming

How to avoid cache misses when data required is far away

aganm · 2019-08-30T00:18:28

I have a list of entities and components which I iterate over in the game loop. Those entities are more often than not NPCs which can fire at each other from a distance. First NPCs are assigned a target by the overseeing AI which decides which target is the most appropriate for each NPC. Then, I iterate over each NPC to compute the firing logic at their target. Can you see the cache misses right here? In pseudo code: foreach (entity, all_entities) { // accessed linearly // target is another entity in the linear structure, but it could be anywhere. // from the next entity in the list, to the one a couple megabytes in RAM away entity.fire_at(entity.target) } How do I rid my code of such cache misses? Most of my logic relies on accessing random memory addresses like this and I've been struggling to refactor.

General and Gameplay Programming Programming Optimization

Started by aganm August 26, 2019 01:50 AM

11 comments, last by dmatter 4 years, 7 months ago

Wyrframe

2,489

August 28, 2019 02:37 AM

38 minutes ago, aganm said:

Isn't 1.1 out of 4 possible clocks per cycle bad?

I'm afraid I don't know off the top of my head whether perf counts thread cycles separately and cumulatively. You haven't said whether your program is single- or multi-threaded, nor how many cores you have (including multithreading cores). If you are single-threaded, why are you assuming there are four "clocks" per clock cycle?

Again, profile your actual code. Stop postulating that cache misses are the cause of the problem; instead, identify one piece of code which is slow in terms of real time, and study it for why. Accessing memory in a cache-friendly manner is an ideal, it's not something you can solve unless your problem only exists in batch scale and with unbounded pre-planning time.

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

dmatter

4,872

August 30, 2019 12:18 AM

I can offer a few thoughts on this:

a) Hold an array of 'hot' data. Since you only care about the position of the target entity and not the full entity data, you could hold an array of entity positions and lookup the target position from that array. Positions are smaller than full-blown entities so more of them will fit into a cache line and thereby increase the chances of a cache hit.

b) If your loop does other stuff in addition to calling fire_at() then delay resolving your target entity until after you finish iterating. That way there's a chance the entity (or even the entire array) is still in the cache after your loop finishes. Iteration is predictable and will allow the pre-fetcher to efficiently pre-fetch cache lines as you iterate. Whereas at the moment you are jumping around sporadically and potentially 'ahead' of the pre-fetcher (or even confusing the pre-fetcher altogether?) and causing cache misses.

c) Did the high-level AI already access both entities (attacker and target) and if so could it have recorded the position of the target entity in the attacking entity? Or could it have created the bullet directly? That way the fire_at() function need not lookup the target entity at all.

d) As it appears that every entity shoots one bullet at a target then you know how many bullets to expect (equal to the number of entities) so preallocate an array of bullets that big and split your loop into 2 passes: First pass creates a bullet object (per attacker entity) but instead of just appending it to the array it will actually place it at the index that corresponds with the index of the *target* entity. Once the first pass is done you will have a bullet array that is aligned with the ordering of the target entities, so the second pass will iterate both arrays in lockstep and copy the position of each entity into each bullet (which means assigning the target entity position into the bullet). Since you are now creating bullets 'out of order' you will experience write-misses when you create the bullets (where you previously experienced read-misses) but as far as I know: A write-miss is cheaper than a read-miss and if the cache's write-buffer has space available then a write-miss is practically as fast as a cache-hit.

e) I doubt every entity shoots a bullet every frame, you could split this work across a few frames or only run it once every N frames.

f) I imagine that the target entity on frame F is the same as on frame F+1. So there is some temporal coherence that could be exploited here. For example you could keep the entity array sorted by who's targeting whom in order to keep targets close to attackers, using a sort algorithm that is fast for nearly-sorted arrays. A related idea would be that instead of looking up the target entity from random locations within the main entity array you build a separate array of target entities that is in the order you require (access order), there might be duplicates in this array if multiple entities can target the same entity. Then you keep this array around for multiple frames, only rebuilding it occasionally (or even incrementally).

g) Does the high-level AI decide how to pair an entity with a target entity based on some exploitable quality? For example if entities have to be spatially close to each other in order to attack then you could sort or partition your entity array by a spatial key to put entities close together in memory if they are close together spatially.

David Gill :: GitHub :: twitter .

How to avoid cache misses when data required is far away

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How to avoid cache misses when data required is far away

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines