How to avoid cache misses when data required is far away

Started by
11 comments, last by dmatter 4 years, 7 months ago
38 minutes ago, aganm said:

Isn't 1.1 out of 4 possible clocks per cycle bad?

I'm afraid I don't know off the top of my head whether perf counts thread cycles separately and cumulatively. You haven't said whether your program is single- or multi-threaded, nor how many cores you have (including multithreading cores). If you are single-threaded, why are you assuming there are four "clocks" per clock cycle?

Again, profile your actual code. Stop postulating that cache misses are the cause of the problem; instead, identify one piece of code which is slow in terms of real time, and study it for why. Accessing memory in a cache-friendly manner is an ideal, it's not something you can solve unless your problem only exists in batch scale and with unbounded pre-planning time.

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.
Advertisement

I can offer a few thoughts on this:

a) Hold an array of 'hot' data. Since you only care about the position of the target entity and not the full entity data, you could hold an array of entity positions and lookup the target position from that array. Positions are smaller than full-blown entities so more of them will fit into a cache line and thereby increase the chances of a cache hit.

b) If your loop does other stuff in addition to calling fire_at() then delay resolving your target entity until after you finish iterating. That way there's a chance the entity (or even the entire array) is still in the cache after your loop finishes. Iteration is predictable and will allow the pre-fetcher to efficiently pre-fetch cache lines as you iterate. Whereas at the moment you are jumping around sporadically and potentially 'ahead' of the pre-fetcher (or even confusing the pre-fetcher altogether?) and causing cache misses.

c) Did the high-level AI already access both entities (attacker and target) and if so could it have recorded the position of the target entity in the attacking entity? Or could it have created the bullet directly? That way the fire_at() function need not lookup the target entity at all.

d) As it appears that every entity shoots one bullet at a target then you know how many bullets to expect (equal to the number of entities) so preallocate an array of bullets that big and split your loop into 2 passes: First pass creates a bullet object (per attacker entity) but instead of just appending it to the array it will actually place it at the index that corresponds with the index of the *target* entity. Once the first pass is done you will have a bullet array that is aligned with the ordering of the target entities, so the second pass will iterate both arrays in lockstep and copy the position of each entity into each bullet (which means assigning the target entity position into the bullet). Since you are now creating bullets 'out of order' you will experience write-misses when you create the bullets (where you previously experienced read-misses) but as far as I know: A write-miss is cheaper than a read-miss and if the cache's write-buffer has space available then a write-miss is practically as fast as a cache-hit.

e) I doubt every entity shoots a bullet every frame, you could split this work across a few frames or only run it once every N frames.

f) I imagine that the target entity on frame F is the same as on frame F+1. So there is some temporal coherence that could be exploited here. For example you could keep the entity array sorted by who's targeting whom in order to keep targets close to attackers, using a sort algorithm that is fast for nearly-sorted arrays. A related idea would be that instead of looking up the target entity from random locations within the main entity array you build a separate array of target entities that is in the order you require (access order), there might be duplicates in this array if multiple entities can target the same entity. Then you keep this array around for multiple frames, only rebuilding it occasionally (or even incrementally).

g) Does the high-level AI decide how to pair an entity with a target entity based on some exploitable quality? For example if entities have to be spatially close to each other in order to attack then you could sort or partition your entity array by a spatial key to put entities close together in memory if they are close together spatially.

This topic is closed to new replies.

Advertisement