I need to fill about 6MB of memory per frame, fast algor needed

Started by
11 comments, last by TrianglesPCT 9 years, 6 months ago

I need to fill a vector of 6MB every frame, is there a fast algorithm to do so?

You may assume this is a flat type configuration, with a boolean in a std::vector,

one followed by another. I find that if I use a bitset<8> instead I can reduce the size of memory

to about 400KB, so what is the best way in your opinion?

Thanks

Jack

Advertisement

std::vector of what?

And where does the data come from which you are filling it with? Is it calculated? A constant value? Copying it? Is it some transformation of existing data?

Without knowing more details, I'd suggest pre-allocating your vectors, and using unsigned chars and bitmasks (8 'bools' per char), and copying using std::memcpy() or std::memset() (which will likely copy 4 or 8 chars at a time, depending on whether you are compiling a 32bit or 64bit executable).

But really, there's a lack of information. The best optimization is to not fill any data at all. Doing nothing is faster than doing something.

Why does it need to happen every frame? Why not every other frame? Or why not once every 100 millaseconds?

Why does the entire vector need to be updated? Why not only a subportion of it each frame?

Why is it happening every frame to begin with? Is the entire vector changing every frame, or only part of it?

Where are you getting the data you are filling it from?

What is your project, at a higher level, trying to accomplish?

std::vector of what?

And where does the data come from which you are filling it with? Is it calculated? A constant value? Copying it? Is it some transformation of existing data?

a vector of booleans

Actually, I am working on a cooperative pathfinding stuff, In every frame, because I am using a dual map approach, one map to astar and one for cooperative pathfinding, when astar finishes, it starts in fill in this pixelated cooperative pathfinding map (a vector of booleans), if the path is reserved this way, all the pixels that the bounding box of the object casting on, will be reserved.

So say an agent is 30x30 - 40x40 pixels wide, I need to preserve 255 steps in advance and I have about 20 agents, so that's about

30x30x255x20 = 5MB

THX

Okay...

I need to fill a vector of 6MB every frame, is there a fast algorithm to do so?

You may assume this is a flat type configuration, with a boolean in a std::vector,

one followed by another. I find that if I use a bitset<8> instead I can reduce the size of memory

to about 400KB, so what is the best way in your opinion?

Thanks

Jack

First of all, if you're storing bits, not bytes, the memory you need is 30x30x255x20 / 8. Which should give you ~570kb.

Second of all, you can fill that very fast, each frame. It's not really that much data. Using bitset works fine, though you may get more speed doing it by hand. Of course this depends on your frame-rate but it's still not that bad - though at 50FPS+ you are starting to take a hit on your available memory bandwidth (depending on what else you are doing in your software this could be fine or could be an issue)

Third - general memory rules apply. If you can write to memory in order rather than randomly, or if you can identify a pattern (which you can then re-organize your data), that will help. Try to find a way to keep each memory access as close to the previous memory access for best performance. If you can avoid some calculations (do you need to calculate all 255 steps each frame?) as Servant mentioned, that is even better.

Try to find a way to keep each memory access as close to the previous memory access for best performance.

I seems very interesting. Could you elaborate a bit on that?

Thanks

Jack

Try to find a way to keep each memory access as close to the previous memory access for best performance.

I seems very interesting. Could you elaborate a bit on that?

Thanks

Jack

It's a general statement.

1) Storing bits instead of bytes (if all you need is 1s and 0s) helps because all your data is much, much closer together.

2) If you calculate only 1 step and reuse the other 254 (or if you can do that even if you aren't already), you, again, can touch only the memory in that step. Otherwise, do you calculate all steps for an agent, then move to the next agent? If so, store all steps together in memory. (agent1_step1, agent1_step2, ..., agent1_step255, agent2_step1, agent2_step2, ...) Or... do you loop through all agents on step 1, then all agents on step 2, etc.? In that case, your memory should be structured the same (agent1_step1, agent2_step1, ..., agent20_step1, agent1_step2, agent2_step2, ...)

3) And so on, and so on.... the point is that even if you access memory randomly, there is some overall pattern you can take advantage of so that they are still locally similar. Even better is if you can process pixels in your agents sequentially... but it's still not totally clear what you're doing and how you're doing it so that's about all I can tell you. Maybe you can get even trickier and trickier (working on multiple bits at a time, multithreading, etc) but who knows. Start with access patterns as mentioned above...

2) If you calculate only 1 step and reuse the other 254 (or if you can do that even if you aren't already), you, again, can touch only the memory in that step. Otherwise, do you calculate all steps for an agent, then move to the next agent?

ahhh, That's a bit wrong in my thinking. What I can do, or must do is to reserve and maybe 32-255 time steps (depth) in the first frame, and don't need to calculate it during these 32-255 time periods unless the player changes strategy.

thanks

Jack

I seems very interesting. Could you elaborate a bit on that?


Standard cache and memory latency stuff. CPUs run much faster than RAM which runs much faster than the network which runs much faster than disk. Every time you access memory you haven't used recently, the CPU has to sit around doing nothing and wait for the memory. Modern CPUs do as much as they can to minimize that impact, but ultimately your code is responsible for being smart about memory usage.

Memory is accessed in chunks called "cache lines" and the CPU typically implements preloading of some kind. Ultimately what this means is that accessing memory addresses very close together is typically faster than accessing memory addresses far apart (since if they're close, you might only have to wait once instead of twice) and that if you access memory in strictly increasing and mostly contiguous blocks (e.g. iterating over an array) the CPU can figure that out and will have the memory request filled before you even need it (avoiding most waits).

Typical novice or Java-esque code tends to just have collections of objects allocated willy nilly or algorithms that don't take these facts of hardware into account and end up running _much_ slower despite being equivalent on paper (your big-O notation analysis of algorithms completely ignores facts of hardware and implementation like this).

Simply put, a hardware-cognizant O(n) algorithm can be much faster than an O(logN) algorithm for some limited range of N. A* and path-finding are one such place where naive by-the-book A* can be significantly slower than a good one.

Sean Middleditch – Game Systems Engineer – Join my team!

Here comes some irrelevant questions.

If I reserve 32 time steps in advance, and the original astar (I use detour), is longer than 32 steps, do I let the agent walk to the 32nd step

and pathfind there onward to the goal?

Thanks

Jack

This topic is closed to new replies.

Advertisement