What algorithm would minimize repeat loading of every member of nxn grid+neighbors?
To prevent misunderstandings, what do you mean by the following?:
Cell
Tile
Object
Neighbor
When do you unload things? Are you scrolling the loaded NxN grid around in a larger grid or something? Are you loading objects only so that you can process them once, then immediately unloading them after that?
There is an nxn grid, say:
A1 B1 C1 D1 E1
A2 B2 C2 D2 E2
A3 B3 C3 D3 E3
A4 B4 C4 D4 E4
A5 B5 C5 D5 E5
The current brute force algorithm loads A1, B1, A2 and B2 then processes A1. Next, it loads C1 and C2, and processes B1. Next it loads D1,E1,D2, and E2 while unloading A1 and A2, then processes C1. And so fourth.
In short, iterate over every member of the grid - load any unloaded neighbors, unload any loaded non-neighbors.
My current algorithm takes about 27,000 ms to cover a 512x512 grid. I'd like to get this down. Several orders of magnitude if possible...
edit: WHY does this forum delete newlines?!?!?!?!?!?
- Load the first row.
- Load the second row.
- Process the first row.
- Load the third row.
- Process the second row.
- Unload the first row.
- Load the fourth row.
- Process the third row.
(repeat)
You only load each cell once, but your memory requirements are 3*n.
If those memory requirements are too high, then instead of a full row, load as much as you can in ways that minimize the amount of overlap when you move to the next loadable region.
If you load anything less than whole rows at once, I believe you will have to load at least one cell multiple times. The larger amount of cells you can keep loaded at once, the less redundant loads you'll have to make.
Another benefit of keeping lots of cells loaded at once is that you could parallel process them.
Depending on what a cell needs its neighbors for, you may be able to unload everything EXCEPT for that data after you've processed that cell.
So, like:
- Load row 1.
- Load row 2.
- Process row 1.
- Unload the no-longer-needed portions of row 1 (i.e. only keep what row 2 will need to access).
- Load row 3.
- Process row 2.
- Fully unload the remainder of row 1.
- Partially unload row 2.
etc.
If you can do this, then you actually only need to keep 1 row (the 'next' row) + 1 cell (the current cell being processed) FULLY loaded at a given time, and can leave the minimum necessary data loaded for the rest of the cells in those 3 rows:
- Load A1, B1, A2.
- Process A1, then partially unload it.
- Load C1, B2.
- Process B1, then partially unload it.
This will continue until you reach the end of row 1, at which point row1 will be partially loaded and row2 will be fully loaded.
From then on:
- Load A3.
- Process A2, then partially unload it.
- Fully unload A1.
- Load B3.
- Process B2, then partially unload it.
- Fully unload B1.
(repeat this process for all further rows)
Doing it this way might drastically fragment memory if your objects use irregular amounts of RAM. If you're using a language that can compact the heap, that could help. If your objects all use the same amount of data, no noticeable fragmentation should occur.
You should be able to do parallel processing in this case as well, as long as you never let the fully-unload action occur on a cell that needs to be used by another active task (i.e. keep the number of tasks limited to <= the number of cells per row), and ensure that the horizontal load step of each task occurs before the next queued task begins.
What exactly do you mean by "load" if you want to load 512x512, but you also want to assume they are so large you cannot have more than a few in memory at the same time?
If you're just processing them, you'll want to have the data you actually need to process for all 512x512 tiles in a contiguous block of your data file so you can take one disk hit instead of 250,000 disk hits. As an example, game programmers often do this with height information, pathing data, and tons of other things by just throwing the data in a texture. An added bonus is then we have that data easily accessed in shaders if we want it. You can do this with any data you want.
Beyond that, make sure you're doing block reads instead of byte-by-byte or variable by variable. Just read the whole file into a char* and sscanf off of the char*, or a std::stringstream if you don't like sscanf, or whatever your language uses if you're not using c/c++.
If that doesn't get the performance you want, you should really provide more details. Optimization is very application-specific, so it's kind of hard guessing what will get you orders of magnitude performance improvements or if that is even possible without knowing what you need to do.