Sign in to follow this  
PolyStream

Quasi-Random Access Dynamic Array

Recommended Posts

Hi, I'm Giovanni Santostefano. I've written a paper for an idea (hope mine :P ) to realize dynamic arrays that not suffer, during a request of memory extension, of reallocation of a new larger array and copy of the old data in the new space. This is very usefull in the case you need big dynamics arrays with big memory extensions. In this specific case I badly talk about "array" because it's an hybrid list/array data structure (loss of memory continuity and loss of pure random access) but in some situations I think mine data structure is more convenient of canonical dynamic arrays and lists. http://santostefanogiovanni.blogspot.com/2008/11/quasi-random-access-dynamic-array.html I'm glad to see what you think. Best Regards Giovanni P.S. ASAP I'll make a C++ implementation

Share this post


Link to post
Share on other sites
Quote:
Original post by PolyStream
an idea (hope mine :P )


No, sorry :(

Quote:
to realize dynamic arrays that not suffer, during a request of memory extension, of reallocation of a new larger array and copy of the old data in the new space.

This is very usefull in the case you need big dynamics arrays with big memory extensions.


Similar reasoning is used for tree structures as well (B tree and others). A file system is an example of such approach. Data is stored in sectors in continuous manner, these in turn are linked via pointers in MFT or in some similar way.

Another possible optimization is to keep pointers between segments in a separate array. Then access to element can be made O(1).

Share this post


Link to post
Share on other sites

Under win32 (at least) there is already a solution to it (I think it was in Game Programming Gems 4 or 5): VirtualAlloc(). Basically what you can do with this is reserve address space (which is different from committing actual memory). If the array grows beyond what has been inittially committed you can just commit more as needed (up until reserved address range). No relocation or copying needed.

Alex

Share this post


Link to post
Share on other sites
Quote:
Original post by ak-73

Under win32 (at least) there is already a solution to it (I think it was in Game Programming Gems 4 or 5): VirtualAlloc(). Basically what you can do with this is reserve address space (which is different from committing actual memory). If the array grows beyond what has been inittially committed you can just commit more as needed (up until reserved address range). No relocation or copying needed.

Alex
Another method is to create a pagefile-based memory-mapped file. You can allocate the file to be as large as you want, and then only map the amount of memory you currently need. If you need more, unmap the current view and remap a larger one. The benefit to this method is that it doesn't consume more address space than you're actually using, but the downside is that the pointer can (and likely will) change whenever you reallocate the memory. It still has the advantage that you don't have to copy data.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius

but the downside is that the pointer can (and likely will) change whenever you reallocate the memory. It still has the advantage that you don't have to copy data.


Under Windows, a page fault causes 4kb to be read from file. If you're short on memory, this will cause considerable paging as soon as old pages get removed from memory. Same will happen if you alt-tab between applications.

Now keep in mind that unless the memory mapped portion gets cached by OS, you will be hitting disk each time.

Memcpy - 2Gb/sec+
Disk (4kb reads) - ~10ms seek + 5Mb/sec read (worst case), ~0ms seek + 40Mb/sec read (best case). Writing is typically slower

I simply fail to see why, in order to avoid copying data once on relocation using memcpy, you would use a method which is 100 times slower, and performs a copy as well (it gets written to disk).

Share this post


Link to post
Share on other sites
Quote:
The benefit to this method is that it doesn't consume more address space than you're actually using


This won't be a concern with any application that I am going to write any time soon though, I think. :-)

Alex

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
[...]I simply fail to see why, in order to avoid copying data once on relocation using memcpy, you would use a method which is 100 times slower, and performs a copy as well (it gets written to disk).
Memory-mapped files are only paged when you're low on memory. Page file-backed memory-mapped files are no different than ordinary memory except that you have more control over it.

Internally, memory-mapped files are 'sections', and 'sections' are how your executable's code and data are put into memory from the file. If it was paged unnecessarily, then every access to a global would be as slow as file access. Considering every DLL and every executable is memory-mapped into several sections, it's no wonder that Microsoft put a lot of effort into making it as fast as possible, and it really is fast. For reading actual disk files, memory-mapping a file and then treating it as a block of data is actually faster for mostly-sequential reads than typical methods (reading large blocks then processing them) because windows knows more info relevant to optimal caching (based on the hardware etc) than the program does and it will do async prefetches. You can probably do better doing your own async i/o, but it's a lot more complicated and I doubt the gains would be significant.

Quote:
Original post by ak-73
[...]This won't be a concern with any application that I am going to write any time soon though, I think. :-)[...]
Really? Just yesterday, I was wishing I had easy access to more address space since I need to store an array with 2^32 entries that are several bytes each. Eventually I'll get around to writing a file-based array.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius

because windows knows more info relevant to optimal caching (based on the hardware etc) than the program does and it will do async prefetches.


Do you have any documentation on this?

To the best of my knowledge, whenever a miss happens, memory mapped files read 4kb chunk around the target address. I don't remember exactly how it's aligned (by sector, starting at that address, something else).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this