• Advertisement
Sign in to follow this  

(C++) Is there a guarantee of static array element data contiguity in RAM?

This topic is 2641 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

When an array with several elements is created, are the individual elements of the array guaranteed to be contiguous in memory?

Meaning, for example, with an array of 5 ints called 'intArray', represented in RAM:

| intArray[0] 0x0001ca00 | intArray[1] 0x0001ca04 | intArray[2] 0x0001ca08 | intArray[3] 0x0001ca0c | intArray[4] 0x0001ca10

With no breaks in between the elements. Is this contiguity always present whenever an array is instantiated (resizable arrays excluded)?

[Edited by - XenoPhyre on October 23, 2010 11:13:37 PM]

Share this post


Link to post
Share on other sites
Advertisement
I've read somewhere that this isn't true when using resizable arrays (vectors, etc.).... Is there any truth to that?

Share this post


Link to post
Share on other sites
I believe that indeed does not hold true with resizable arrays, perhaps because of all the copying and shuffling-around that goes on when arrays are resized, elements removed or added, etc.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
On the contrary, std::vector is guaranteed to provide contiguous storage for its elements.


Hmm, I had no idea std::vectors featured contiguity. Thanks!

Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
Does the page size have any bearing on this? Just wondering...


Any bearing on what in particular? A vector is guaranteed continuous by the standard (23.2.4 §1), as are arrays. I did notice a misconception in the EASTL that someone thinks vectors aren't guaranteed contiguous... it was one of the "differences" they listed between their version and the standard library version.

While physical RAM pages may not be contiguous (in a virtual memory system), that actually will have no effect on your access speed (for caching or other issues), as page boundaries are always cache aligned.

Share this post


Link to post
Share on other sites
In the first version of the C++ standard, released in 1998, storage for std::vector wasn't guaranteed to be contiguous. However when technical corrigenda 1 was released in 2003, a requirement for contiguous storage was added.

Share this post


Link to post
Share on other sites
Quote:
While physical RAM pages may not be contiguous (in a virtual memory system), that actually will have no effect on your access speed (for caching or other issues)

Not quite true - only one rank of a DRAM module can be active per channel. When page frames are scattered, you might see more open/precharge going on. However, there is precious little we can do about it, except maybe use large (2 or 4 MB) pages ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
Quote:
Original post by LessBread
Does the page size have any bearing on this? Just wondering...


Any bearing on what in particular? A vector is guaranteed continuous by the standard (23.2.4 §1), as are arrays. I did notice a misconception in the EASTL that someone thinks vectors aren't guaranteed contiguous... it was one of the "differences" they listed between their version and the standard library version.

While physical RAM pages may not be contiguous (in a virtual memory system), that actually will have no effect on your access speed (for caching or other issues), as page boundaries are always cache aligned.


I was thinking about the situation where the requested memory is greater than a page size. For example, if the page size is 4096 bytes, an array of ints with more than 1024 elements. I understand that the standard guarantees contiguousness, but with a virtual memory system, it would seem that the OS would also have to guarantee contiguousness.

Share this post


Link to post
Share on other sites
The way the C++ standard defines contiguous memory is based on the addresses of the individual elements, which basically means the guarantees are only for contiguous virtual memory. The OS can give you non-contiguous physical memory and it doesn't affect C++'s guarantees.

Share this post


Link to post
Share on other sites
XenoPhyre's question was actually whether or not they would be continuous in ram, not continuous in the applications memory space.

The real answer is no, there is no guarantee that anything is ever going to be represented continuously in RAM due to memory mapping.

;)

Share this post


Link to post
Share on other sites
Contiguity in physical address space is effectively meaningless, since it has no affect on retrieval time. Any given memory frame costs the same to retrieve: either the page is swapped in or nor, either it's cached or not, but the lookup through the page table, and any subsequent cache or swap costs, is the same regardless of the virtual-to-physical mapping.

What counts is whether it's contiguous in virtual space or not, because that can affect locality of reference, cache prefetch, and swapping algorithms.

Share this post


Link to post
Share on other sites
Quote:
Contiguity in physical address space is effectively meaningless, since it has no affect on retrieval time. Any given memory frame costs the same to retrieve

Please stop spreading misinformation.
First, note that DIMMs are organized into ranks of multiple ICs (think "sides" of the module, but that's not necessarily correct), each with several banks consisting of a 2D matrix. For a given bank, the Nth matrix row of all ICs in the rank constitute a DRAM page.
Since there may be several (e.g. 2) IA-32 page frames per DRAM page, and DRAM pages must be 'opened' before being accessed, it is clear that accessing the neighboring page frame will be cheaper than a random access.

How much of a difference does it make? With a DRAM page already open, you only pay tCAS (aka CL). If the bank is at least idle with no page open, just add tRCD. Otherwise, also add tRP for closing (precharging) the previous page. Since those timings are usually similar or equal, expect relative costs of 1:2:3 for those cases.

PS: there is nifty Adaptive Page Management logic in the memory controller that decides whether to close pages or keep them open, which might succeed in hiding some of the precharges. However, accesses to another DRAM page will still have at least twice the latency.

Share this post


Link to post
Share on other sites
What Jan Wassenberg said. Plus, the more you spread out your physical pages, the higher the likelihood of a cache mapping miss.

Once upon a time, one used to calculate the offsets at which mapping collisions would occur, so one could try to avoid these. I don't think anyone does that any more these days, but this doesn't mean that the problem doesn't exist in principle. I guess the reason why one doesn't spend time on this any more is probably the insight that since the only addresses you can see are virtual addresses, so the entire endeavour is kind of pointless from the beginning :-)

Anyway, if I remember right, the "magic number" was something between 2 and 64 MB for the previous generation of desktop CPUs, depending on model -- might be somewhat different for the current generation.
So, if you have a "typical, normal" dataset that is not much larger than this magic offset, you will have no collisions, or maybe one if you're unlucky. But if your pages are spread out on random physical memory addresses, it is not unlikely that you have a few of them.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jan Wassenberg
Quote:
Contiguity in physical address space is effectively meaningless, since it has no affect on retrieval time. Any given memory frame costs the same to retrieve

Please stop spreading misinformation.

You sure know your stuff, but at the end of the day that is mostly irrelevant because we have no control of the hardware at that level. Its up to the OS memory management programmers to try make such optimisations for us.

From a user-space programmer's point of view, contiguity is effectively meaningless.

Share this post


Link to post
Share on other sites
samoth: very interesting. Do you have pointers to additional information?

Quote:
You sure know your stuff, but at the end of the day that is mostly irrelevant because we have no control of the hardware at that level. Its up to the OS memory management programmers to try make such optimisations for us.

heh, that would be the ostrich algorithm. Unfortunately, the OS and its programmers cannot divine your access patterns and don't know whether defragmentation would be worthwhile.
However, there is a means of communicating that intent, even from user-space: requesting the abovementioned large pages from VirtualAlloc/mmap. That gets you 2 or 4 MB chunks of contiguous physical memory, which is a major improvement, and also lightens the load on small page TLBs. However, the last time I checked, Linux didn't bother with defragmentation and assumes anyone who actually wants "huge pages" will grab them at boot time (boo!).

Share this post


Link to post
Share on other sites
Quote:
Original post by Jan Wassenberg
samoth: very interesting. Do you have pointers to additional information?
However, the last time I checked, Linux didn't bother with defragmentation and assumes anyone who actually wants "huge pages" will grab them at boot time (boo!).

That's not true: linux provides allocations of huge chunks at any time. That feature is used al lthe time by hardware interfaces (video processing is a classic use as frames from hardware codecs are buffered and tunnelled into renderers -- contiguous memory in DMA address space is a requirement). It's just not a feature of the POSIX mmap() call: you need to use ioctls. Problem is, if you're memory constrained there is no guarantee a large chunk is available at runtime, so most drivers will grab buffers at boot and hold on to them. Windows doesn't do this because it's only a desktop OS.

Share this post


Link to post
Share on other sites
Quote:
It's just not a feature of the POSIX mmap() call: you need to use ioctls.

I am going by the documentation, which says mmap of a file in a hugetlbfs has the desired effect (but ugh, what a hack).

Quote:
That's not true: linux provides allocations of huge chunks at any time. [...]
Problem is, if you're memory constrained there is no guarantee a large chunk is available at runtime, so most drivers will grab buffers at boot and hold on to them.
Windows doesn't do this because it's only a desktop OS.

It sounds like you are actually claiming the same thing, viz. runtime allocation of huge pages being possible, but possibly failing due to fragmentation. The cheap and meaningless shot at Windows notwithstanding, it will actually defragment memory at runtime rather than resorting to boot-time reserval. (However, criticism is warranted for XP x64 appearing to just trim the working set of processes until enough memory has coalesced, which can take *seconds*, no joke.)
I just saw recent work on Linux that will appear to provide proper reclaiming, which sounds great.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement