Try to find a way to keep each memory access as close to the previous memory access for best performance.
I seems very interesting. Could you elaborate a bit on that?
Thanks
Jack
It's a general statement.
1) Storing bits instead of bytes (if all you need is 1s and 0s) helps because all your data is much, much closer together.
2) If you calculate only 1 step and reuse the other 254 (or if you can do that even if you aren't already), you, again, can touch only the memory in that step. Otherwise, do you calculate all steps for an agent, then move to the next agent? If so, store all steps together in memory. (agent1_step1, agent1_step2, ..., agent1_step255, agent2_step1, agent2_step2, ...) Or... do you loop through all agents on step 1, then all agents on step 2, etc.? In that case, your memory should be structured the same (agent1_step1, agent2_step1, ..., agent20_step1, agent1_step2, agent2_step2, ...)
3) And so on, and so on.... the point is that even if you access memory randomly, there is some overall pattern you can take advantage of so that they are still locally similar. Even better is if you can process pixels in your agents sequentially... but it's still not totally clear what you're doing and how you're doing it so that's about all I can tell you. Maybe you can get even trickier and trickier (working on multiple bits at a time, multithreading, etc) but who knows. Start with access patterns as mentioned above...