Jump to content
  • Advertisement
Sign in to follow this  
lucky6969b

I need to fill about 6MB of memory per frame, fast algor needed

This topic is 1326 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I need to fill a vector of 6MB every frame, is there a fast algorithm to do so?

You may assume this is a flat type configuration, with a boolean in a std::vector,

one followed by another. I find that if I use a bitset<8> instead I can reduce the size of memory

to about 400KB, so what is the best way in your opinion?

Thanks

Jack

Share this post


Link to post
Share on other sites
Advertisement

std::vector of what?

 

And where does the data come from which you are filling it with? Is it calculated? A constant value? Copying it? Is it some transformation of existing data?

Share this post


Link to post
Share on other sites

std::vector of what?

 

And where does the data come from which you are filling it with? Is it calculated? A constant value? Copying it? Is it some transformation of existing data?

 

a vector of booleans

Actually, I am working on a cooperative pathfinding stuff, In every frame, because I am using a dual map approach, one map to astar and one for cooperative pathfinding, when astar finishes, it starts in fill in this pixelated cooperative pathfinding map (a vector of booleans), if the path is reserved this way, all the pixels that the bounding box of the object casting on, will be reserved.

So say an agent is 30x30 - 40x40 pixels wide, I need to preserve 255 steps in advance and I have about 20 agents, so that's about

30x30x255x20 = 5MB

THX

Edited by lucky6969b

Share this post


Link to post
Share on other sites

Okay...

 

 

I need to fill a vector of 6MB every frame, is there a fast algorithm to do so?

You may assume this is a flat type configuration, with a boolean in a std::vector,

one followed by another. I find that if I use a bitset<8> instead I can reduce the size of memory

to about 400KB, so what is the best way in your opinion?

Thanks

Jack

 

First of all, if you're storing bits, not bytes, the memory you need is 30x30x255x20 / 8. Which should give you ~570kb.

 

Second of all, you can fill that very fast, each frame. It's not really that much data. Using bitset works fine, though you may get more speed doing it by hand. Of course this depends on your frame-rate but it's still not that bad - though at 50FPS+ you are starting to take a hit on your available memory bandwidth (depending on what else you are doing in your software this could be fine or could be an issue)

 

Third - general memory rules apply. If you can write to memory in order rather than randomly, or if you can identify a pattern (which you can then re-organize your data), that will help. Try to find a way to keep each memory access as close to the previous memory access for best performance. If you can avoid some calculations (do you need to calculate all 255 steps each frame?) as Servant mentioned, that is even better.

Edited by achild

Share this post


Link to post
Share on other sites

 

Try to find a way to keep each memory access as close to the previous memory access for best performance.

 

I seems very interesting. Could you elaborate a bit on that?

Thanks

Jack

Share this post


Link to post
Share on other sites

 

 

Try to find a way to keep each memory access as close to the previous memory access for best performance.

 

I seems very interesting. Could you elaborate a bit on that?

Thanks

Jack

 

 

It's a general statement.

 

1) Storing bits instead of bytes (if all you need is 1s and 0s) helps because all your data is much, much closer together.

 

2) If you calculate only 1 step and reuse the other 254 (or if you can do that even if you aren't already), you, again, can touch only the memory in that step. Otherwise, do you calculate all steps for an agent, then move to the next agent? If so, store all steps together in memory. (agent1_step1, agent1_step2, ..., agent1_step255, agent2_step1, agent2_step2, ...) Or... do you loop through all agents on step 1, then all agents on step 2, etc.? In that case, your memory should be structured the same (agent1_step1, agent2_step1, ..., agent20_step1, agent1_step2, agent2_step2, ...)

 

3) And so on, and so on.... the point is that even if you access memory randomly, there is some overall pattern you can take advantage of so that they are still locally similar. Even better is if you can process pixels in your agents sequentially... but it's still not totally clear what you're doing and how you're doing it so that's about all I can tell you. Maybe you can get even trickier and trickier (working on multiple bits at a time, multithreading, etc) but who knows. Start with access patterns as mentioned above...

Edited by achild

Share this post


Link to post
Share on other sites

2) If you calculate only 1 step and reuse the other 254 (or if you can do that even if you aren't already), you, again, can touch only the memory in that step. Otherwise, do you calculate all steps for an agent, then move to the next agent?

 

ahhh, That's a bit wrong in my thinking. What I can do, or must do is to reserve and maybe 32-255 time steps (depth) in the first frame, and don't need to calculate it during these 32-255 time periods unless the player changes strategy.

thanks

Jack

Share this post


Link to post
Share on other sites

I seems very interesting. Could you elaborate a bit on that?


Standard cache and memory latency stuff. CPUs run much faster than RAM which runs much faster than the network which runs much faster than disk. Every time you access memory you haven't used recently, the CPU has to sit around doing nothing and wait for the memory. Modern CPUs do as much as they can to minimize that impact, but ultimately your code is responsible for being smart about memory usage.

Memory is accessed in chunks called "cache lines" and the CPU typically implements preloading of some kind. Ultimately what this means is that accessing memory addresses very close together is typically faster than accessing memory addresses far apart (since if they're close, you might only have to wait once instead of twice) and that if you access memory in strictly increasing and mostly contiguous blocks (e.g. iterating over an array) the CPU can figure that out and will have the memory request filled before you even need it (avoiding most waits).

Typical novice or Java-esque code tends to just have collections of objects allocated willy nilly or algorithms that don't take these facts of hardware into account and end up running _much_ slower despite being equivalent on paper (your big-O notation analysis of algorithms completely ignores facts of hardware and implementation like this).

Simply put, a hardware-cognizant O(n) algorithm can be much faster than an O(logN) algorithm for some limited range of N. A* and path-finding are one such place where naive by-the-book A* can be significantly slower than a good one.

Share this post


Link to post
Share on other sites

Here comes some irrelevant questions.

If I reserve 32 time steps in advance, and the original astar (I use detour), is longer than 32 steps, do I let the agent walk to the 32nd step

and pathfind there onward to the goal?

Thanks

Jack

Edited by lucky6969b

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!