Sign in to follow this  
beun

Do STL containers always move memory?

Recommended Posts

beun    160
Hey, This is not an important question, but if I do a push_back/reserve on a std::vector, how will it reallocate? For example, if I do a C-style realloc on a memory block, and there's enough empty space after that memory block, it doesn't copy that memory block to another location. It just changes its size (needed for realloc/free). But C++ only has new and delete, so how is it done in the STL? It's not very efficient to always allocate a new block with the new size, copy the whole data and delete the previous block when that's not needed in most of the cases. So does anyone know how that's done in the STL?

Share this post


Link to post
Share on other sites
CmpDev    100
Quote:
Original post by beun
Hey,

This is not an important question, but if I do a push_back/reserve on a std::vector, how will it reallocate?

It depends on the allocator.
Quote:
For example, if I do a C-style realloc on a memory block, and there's enough empty space after that memory block, it doesn't copy that memory block to another location. It just changes its size (needed for realloc/free).

You seem to know this is not correct, if there is not enough space its need another different block and then copies the first block into the second.
Quote:

But C++ only has new and delete, so how is it done in the STL? It's not very efficient to always allocate a new block with the new size, copy the whole data and delete the previous block when that's not needed in most of the cases. So does anyone know how that's done in the STL?
This the behaviour of the default allocator.

Share this post


Link to post
Share on other sites
Antheus    2409
Standard describes basic characteristics of each container, which affect how memory is allocated.

Each container has allocator parameter, which allows user to determine how/where memory is allocated. The role of realloc is taken over by implementation of this allocator.

Quote:
So does anyone know how that's done in the STL?


You have the source, look at it.

If should be noted that if appropriate containers are chosen for operations, then algorithms performed on them in combination with appropriate allocator will result in optimal code.

Quote:
It's not very efficient to always allocate a new block with the new size


Vector behaves similar to realloc. It will perform minimum number of allocations/moves required by a sequence of operations - but it cannot guess programmer's intention. That is algorithmic problem, using wrong one will result in sub-optimal behavior, even with realloc.

This is why it's recommended to pre-allocate certain containers to the maximum number of elements to minimize the number of redundant resizes.

Share this post


Link to post
Share on other sites
beun    160
Quote:
This the behaviour of the default allocator.

Thanks for the reply. I guess that's all I wanted to know.

Quote:
You have the source, look at it.

Well, in all the allocation functions in "vector", the following is called (MSVC 9):
this->_Alval.(de)allocate



That's the allocator (default or user-defined)
template<class _Ty,
class _Alloc>
class _Vector_val
: public _CONTAINER_BASE_AUX_ALLOC<_Alloc>
{ // base class for vector to hold allocator _Alval
protected:
_Vector_val(_Alloc _Al = _Alloc())
: _CONTAINER_BASE_AUX_ALLOC<_Alloc>(_Al), _Alval(_Al)
{ // construct allocator from _Al
}

typedef typename _Alloc::template
rebind<_Ty>::other _Alty;

_Alty _Alval; // allocator object for values
};




But I don't have the source for the default allocator.

Thanks for the replies, and as I said, this was just a little question that popped in my mind, I don't want to flame std::vector, and I won't stop using std::vector for these things.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by beun

But I don't have the source for the default allocator.


It's std::allocator<T>, so source is present. But even without it, it calls new/delete, and nothing more.

Quote:
Thanks for the replies, and as I said, this was just a little question that popped in my mind, I don't want to flame std::vector, and I won't stop using std::vector for these things.


Allocators tend to be somewhat randomly supported in not necessarily compatible ways. As such, there might be a lack of solid information on how to implement them.

Another issue when dealing with them is that to support them in a generic manner, code becomes convoluted. For example, to handle generic string:
template < class C, class T, class A >
void foo(const std::basic_string<C, T, A> & s);
. C is char type, T is type traits, A is allocator.

Valuable read(ppt).

Share this post


Link to post
Share on other sites
Sneftel    1788
Quote:
Original post by beun
It's not very efficient to always allocate a new block with the new size, copy the whole data and delete the previous block when that's not needed in most of the cases.

Surprisingly, it is. Part of the explanation for that is to be found in the amortized analysis of resizing, the upshot of which is that it happens so seldom that even if it did take a long time, the long-term performance of std::vector would still be good. The other important point is that under the circumstances in which std::vector resizes, realloc would nearly always be forced to move the memory anyway, because the size is increasing by a fixed multiplier (often 1.5 or 2).

Share this post


Link to post
Share on other sites
implicit    504
Well, glibc does zero-copy realloc for large blocks by using memory mapping tricks. And I wouldn't be surprised if other standard libraries or replacement allocators do the same thing but I haven't looked into it. A nice consequence of this is that you can reduce the growth factor without taking much of a performance hit (if you can rely on the feature that is..).
Do any std::vector implementations use similar tricks for POD types? Obviously the standard allocator interface doesn't support this kind of thing, but presumably it could be done with some weird partial specialization hack.

Share this post


Link to post
Share on other sites
Sneftel    1788
Quote:
Original post by implicit
Well, glibc does zero-copy realloc for large blocks by using memory mapping tricks. And I wouldn't be surprised if other standard libraries or replacement allocators do the same thing but I haven't looked into it. A nice consequence of this is that you can reduce the growth factor without taking much of a performance hit (if you can rely on the feature that is..).

To quote Daffy Duck: "I know, I know, but I can only do it once." The tricky thing about using virtual memory mapping in this manner is that your local address space is still linear. So if you want to be able to grow a 3 byte buffer to 16 megabytes later on, you don't actually have to allocate 16 megabytes of RAM just yet, but you do have to reserve a 16 MB hole in the local address space for later growth, which becomes problematic if you have a whole bunch of buffers. So it's a cool trick--hell, it's almost a magic trick--, but its applicability is limited.

As for whether std::vector uses it? Unlikely. The allocator interface is badly designed and too weak for, well, pretty much anything except standard or pool allocation. EASTL's allocator stuff is much nicer. I doubt they use this trick either, though, due to the lack of virtual memory on consoles.

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by Sneftel
To quote Daffy Duck: "I know, I know, but I can only do it once." The tricky thing about using virtual memory mapping in this manner is that your local address space is still linear. So if you want to be able to grow a 3 byte buffer to 16 megabytes later on, you don't actually have to allocate 16 megabytes of RAM just yet, but you do have to reserve a 16 MB hole in the local address space for later growth, which becomes problematic if you have a whole bunch of buffers. So it's a cool trick--hell, it's almost a magic trick--, but its applicability is limited.
Actually glibc's realloc() uses mremap to move physical memory within the virtual address space. You still have to worry about virtual memory fragmentation, but that's no different from normal relocation.
Of course reserving a fixed-size chunk of address space has some nice advantages as well (not invalidating references for one) but that'd have to be used very carefully for memory hungry applications on 32-bit machines, though for say a 64-bit console game I'd use it all over the place.
Or don't modern consoles use virtual memory? I remember some N64 games using it at least (awesome for code, what with the N64 having little memory and fast random-access ROM memory) but perhaps you can save some silicon by cutting out that particular feature, or perhaps all that extra indirection has more of a performance impact then I would have thought.

Actually now that I think about it the STL allocators not having a concept of reallocation has some other annoying consequences in addition to lost optimization opportunities, e.g. using more memory (well duh.. but it's easy to forget that worst-case overhead is two thirds for a size-doubling vector) as well as increased fragmentation. Of course not every vector contains POD, though the really large ones tend to, but you could often write some relocation fixup code manually instead.

Share this post


Link to post
Share on other sites
iMalc    2466
The "yasli_vector" from Loki is the smartest implementation I've seen.
beun: The vector is not completely reallocated for each push_back. The underlying array it uses is only reallocated when the extra amount is allocatd last time becomes used up. The size grows by a constant multiple e.g. times 2, not by a constant amount. This is actually good enough speedwise that it covers for the fact that this even occurs at all, due to the amortised constant time taken. Don't worry about it.

Share this post


Link to post
Share on other sites
Sneftel    1788
Quote:
Original post by implicit
Actually glibc's realloc() uses mremap to move physical memory within the virtual address space. You still have to worry about virtual memory fragmentation, but that's no different from normal relocation.
Yes, that's what I mean. If you have a buffer allocated at start address 0x00040000, and another one allocated at start address 0x00050000, the most you're going to be able to grow the first buffer is to 64 KB, no matter where you put it on physical memory.
Quote:
Or don't modern consoles use virtual memory? I remember some N64 games using it at least (awesome for code, what with the N64 having little memory and fast random-access ROM memory) but perhaps you can save some silicon by cutting out that particular feature, or perhaps all that extra indirection has more of a performance impact then I would have thought.
TBH, I'm not quite sure why they don't use virtual memory. It takes up some silicon, but not much, and the dedicated circuitry avoids a significant performance impact. BTW, I only know this about the 360... I'm pretty sure it's true of the Wii, and I'm not sure about the PS3.
Quote:
Actually now that I think about it the STL allocators not having a concept of reallocation has some other annoying consequences in addition to lost optimization opportunities, e.g. using more memory (well duh.. but it's easy to forget that worst-case overhead is two thirds for a size-doubling vector) as well as increased fragmentation. Of course not every vector contains POD, though the really large ones tend to, but you could often write some relocation fixup code manually instead.
This is due for an improvement in C++0x, through rvalue references, and their use in an updated standard library. BTW, I think your math is wrong on worst-case overhead. For size doubling, it would be 50% at most. Of course, if 50% of your vector works out to a significant amount of memory, you should really reserve() early.

Share this post


Link to post
Share on other sites
SiCrane    11839
For size doubling, you can have the old vector data, then twice that in the new vector data, so the required memory is just over three times that of the old vector size at the time of resizing before the old memory is reclaimed.

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by Sneftel
Quote:
Original post by implicit
Actually glibc's realloc() uses mremap to move physical memory within the virtual address space. You still have to worry about virtual memory fragmentation, but that's no different from normal relocation.
Yes, that's what I mean. If you have a buffer allocated at start address 0x00040000, and another one allocated at start address 0x00050000, the most you're going to be able to grow the first buffer is to 64 KB, no matter where you put it on physical memory.
I still don't think we're talking about the same thing.
Say the logical block at 0x40000 has already filled up to 64k, then mremap merely has to allocate another piece of physical memory (which doesn't have to be contiguous with anything), then reserve a new larger virtual memory block somewhere and finally map the old physical memory pages to these new addresses together with the new pages. There's never any copying involved or anything like that. Admittedly the granularity is somewhat limited and calling into the kernel might involve all kinds of overhead including taking nasty locks and the like, but for a vector on the order of a megabyte or more it's obviously a huge win.

By the way a really neat hack is to leave the original data still in place, e.g. as a mirror of the physical memory, and thus keep pointers to the original data valid for a while longer.

Quote:
BTW, I think your math is wrong on worst-case overhead. For size doubling, it would be 50% at most. Of course, if 50% of your vector works out to a significant amount of memory, you should really reserve() early.
But you need to have the old array plus the new (twice as large) array in memory simultaneously while copying over the old data, hence 67% worst-case overhead. At least I believe the standard allocators work that way.


I'm *way* to slow for this forum..

Share this post


Link to post
Share on other sites
SiCrane    11839
Quote:
Original post by implicit
I still don't think we're talking about the same thing.
Say the logical block at 0x40000 has already filled up to 64k, then mremap merely has to allocate another piece of physical memory (which doesn't have to be contiguous with anything), then reserve a new larger virtual memory block somewhere and finally map the old physical memory pages to these new addresses together with the new pages.

How would it do that if there's already a page allocated at the virtual address of 0x00050000?

Share this post


Link to post
Share on other sites
Sneftel    1788
Quote:
Original post by implicit
I still don't think we're talking about the same thing.
Say the logical block at 0x40000 has already filled up to 64k, then mremap merely has to allocate another piece of physical memory (which doesn't have to be contiguous with anything), then reserve a new larger virtual memory block somewhere and finally map the old physical memory pages to these new addresses together with the new pages.
Okay. Let's say you grow it to 128k. Right after that, your program tries to access the (virtual) memory address 0x00050001. Does that refer to the 65538th byte of the just-resized buffer located at 0x00040000, or the 2nd byte of the other buffer located at 0x00050000?

EDIT: Also ninja'd.

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by SiCrane
Quote:
Original post by implicit
I still don't think we're talking about the same thing.
Say the logical block at 0x40000 has already filled up to 64k, then mremap merely has to allocate another piece of physical memory (which doesn't have to be contiguous with anything), then reserve a new larger virtual memory block somewhere and finally map the old physical memory pages to these new addresses together with the new pages.

How would it do that if there's already a page allocated at the virtual address of 0x00050000?
I'm not suggesting that you keep the original base address valid, merely that you use the MMU to speed up realloc. It still has the old semantics of possibly needing to use some other virtual memory address, the point is that you merely have to rewrite some TLB entries instead of copy actual memory around.
And, yes, this is obviously only directly applicable to POD structures.

Share this post


Link to post
Share on other sites
SiCrane    11839
Interesting. Do you have to do anything special to make sure that an allocation is able to be resized in this manner when it's allocated initially? Wouldn't that cause a lot of memory fragmentation for mostly unused physical memory pages if you did it everywhere, even in a 64-bit environment?

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by SiCrane
Interesting. Do you have to do anything special to make sure that an allocation is able to be resized in this manner when it's allocated initially? Wouldn't that cause a lot of memory fragmentation for mostly unused physical memory pages if you did it everywhere, even in a 64-bit environment?
Standard the Linux realloc() function does this automatically behind your back so there's no flags to worry about there, and similar hacks can be achieved through different means on most any operating system (e.g. file mapping or whatever).
I'm not sure what you mean about the fragmentation. There's no such thing as physical memory fragmentation (in the classical sense anyway but I wouldn't be surprised if lots of complex mappings slowed down allocation) and virtual memory fragmentation is just as bad as it always has been. Actually it's more like the opposite as with a faster realloc you can afford to reduce the growth factor and thus save memory and address space.
But granularity is certainly a problem, on Windows in particular it's a full 64k block, so you wouldn't want to use it until the vector have grown fairly large. But then doing a kernel transition to avoid copying a 4k page doesn't sound all that sane anyway, so 64k is probably a reasonable lower limit.

Share this post


Link to post
Share on other sites
SiCrane    11839
If you always allocate a full page for any memory allocation no matter how small, the rest of the page would be lost to internal fragmentation. So for a 4K physical memory page you'd lose 4092 bytes for a 4 byte allocation. This doesn't seem like a particularly sane behavior to have as default.

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by SiCrane
If you always allocate a full page for any memory allocation no matter how small, the rest of the page would be lost to internal fragmentation. So for a 4K physical memory page you'd lose 4092 bytes for a 4 byte allocation. This doesn't seem like a particularly sane behavior to have as default.
Ah. I didn't mean to suggest that glibc does it for all reallocs, only that it applies this optimization automatically once the block has grown large enough (presumably when forced to relocate a block which is at least one page long).

Share this post


Link to post
Share on other sites
mattnewport    1038
Quote:
Original post by Sneftel
TBH, I'm not quite sure why they don't use virtual memory. It takes up some silicon, but not much, and the dedicated circuitry avoids a significant performance impact. BTW, I only know this about the 360... I'm pretty sure it's true of the Wii, and I'm not sure about the PS3.


The 360 (and PS3) both have virtual addressing. They don't have 'virtual memory' in the usual OS sense in that there's no swapfile allowing you to allocate more virtual memory than you have physical memory but they do both use virtual addressing for CPU addressed memory.

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by SiCrane
If you always allocate a full page for any memory allocation no matter how small, the rest of the page would be lost to internal fragmentation. So for a 4K physical memory page you'd lose 4092 bytes for a 4 byte allocation. This doesn't seem like a particularly sane behavior to have as default.
Certainly no, not by default.
However it's a great debugging tool to put every memory allocation at the end of a page, followed by an unallocated page, as then any buffer overrun causes an immediate page-fault and viola you've just learnt how page-heap debugging works, in a nutshell. I can provide more info on using this technique on Windows, if desired.

The idea of remapping the physical memory to a new virtual address range sounds interesting though. Any details as to how this can be done, say on Windows?

Share this post


Link to post
Share on other sites
implicit    504
Quote:
Original post by iMalc
The idea of remapping the physical memory to a new virtual address range sounds interesting though. Any details as to how this can be done, say on Windows?
As far as I can tell it cannot be done easily or cleanly, though Microsoft have an excellent opportunity to add such functionality themselves to HeapReAlloc.
The most straightforward way of doing it manually is to create a temporary file, which itself can obviously be resized freely, and recreate a file mapping whenever it needs to grow. Of course this involves like five system calls for a single resize and won't use the normal swap space, plus creating an actual file in the first place involves all kinds of overhead.

My next idea was to set up a file mapping which isn't backed by an actual file by passing INVALID_HANDLE_VALUE to CreateFileMapping, unfortunately even though Microsoft separates the view of a file as part of the virtual address space from a file mapping *object* there's still no function to resize the damned thing. You can however reserve space for the largest possible buffer and specify SEC_RESERVE to avoid having it mapped to any actual memory, or eating any swap space for that matter, then all you'd have to do is create larger and larger views and use VirtualAlloc to commit more physical memory.
Sounds like a good idea, right? So I tried reserving a full four gigabytes just to see that it didn't try to eat any actual memory after all and guess what? Instant BSOD. Admittedly that's running as an administrator (on XP) but still..

Someone care to explain to me what's so heinous about this code? I've got two gigs of RAM and many more gigabytes of swap space, so it's not as if I'm running out of memory here.
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>

int main(void) {
HANDLE mapping = CreateFileMapping (
INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE | SEC_RESERVE,
0x00000001,
0x00000000,
NULL
);

if(!mapping) {
fputs("CreateFileMapping() failed", stderr);
return 1;
}

getchar();
return 0;
}

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this