new[] is flawed?

Started by
37 comments, last by tivolo 10 years, 6 months ago

Hey guys, I have a strange problem and I don't know how best to word this.

At my work we have a custom memory manager. It works just fine, but the man who wrote it says that new[] cannot be used because of underlying issues regarding alignment, he thinks. It's been awhile since he wrote this code and he doesn't quite remember.

But when I was writing code the other day, using new[], I found an issue where our memory manager would return a pointer to the overloaded new[] but the pointer I received from the actual new[] was four bytes off. Let me give you some pseudocode to better describe this.


void* operator new( int size )
{
      void* memory = m_MemoryManager->AllocateSize( size ); // Let's say the pointer was 4
      return memory; // Pointer is still 4
}

...

x* stuff = new x[ 4 ]; // x's pointer will now be 8.

The weird thing is that the pointer offsetting is happening when the stack is unwinding, some space between the actual function and the place where "stuff" is being allocated.

I imagine this mystery code is where the compiler actually invokes the constructor for each element of the array, but why would it alter the address?

Can someone explain like I'm five what is actually going on here?

(This is not in a multi-threaded environment, so nothing should be altering the pointer from underneath me.)

Perception is when one imagination clashes with another
Advertisement
The standard actually specifies this IIRC; there is some overhead for array allocations which is commonly used for bookkeeping, such as stashing the length of the array, or other memory-management needs.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Hmm I think I remember what you're talking about, somethinga bout knowing how many objects to call the destructor for. The strange thing was that it wouldn't do it on some pointers, but would on others. I just wanted to make sure there wasn't a way to solve this issue, because I really like having arrays of objects. It's so nice for cache coherency!

Perception is when one imagination clashes with another

Could be that, for example, the length of the array is needed in order to know how many objects to call the destructor on when the memory is released. But if the objects are of primitive type or of a type with no/empty destructor, then no destructor call is necessary and thus no overhead is necessary to keep the length of the array. So different types may need different bookkeeping information.

That would make sense. I'd have loved for some consistency so I could have accounted for it though...

Perception is when one imagination clashes with another

If you are allocating 4 ints, and the pointer moves from, say, 0x04 to 0x08, doesn't this mean that your 'size' variable is also getting changed, to add 4 extra bytes?
So there's nothing you really need to take account for, since it is automatically handled.

Here's a quick test to be sure:


#include <iostream>
#include <cstdint>
using namespace std;

void* operator new( size_t size )
{
	std::cout << "The size 'new' is actually asking for: " << size << " bytes." << std::endl;
    void* memory = malloc( size ); // Let's say the pointer was 4
    
    std::cout << "Original address: " << memory << std::endl;
    
    return memory; // Pointer is still 4
}

class Object
{
public:
	Object() = default;
	virtual ~Object() = default; //Virtual, to give it a vtable, to ensure it's non-POD.
	
	int meow = 357;
};

struct POD
{
	uint64_t A;
	uint64_t B;
};

int main()
{
	const int NumObjectsToAllocate = 4;
	std::cout << "Sizeof 'Object': " << sizeof(Object) << std::endl;
	std::cout << "Allocating " << NumObjectsToAllocate
	          << " Objects should take " << (NumObjectsToAllocate * sizeof(Object))
	          << " bytes." << std::endl;
	
	Object *stuff = new Object[ NumObjectsToAllocate ];
	
	std::cout << "Resulting address: " << stuff << std::endl;
	
	delete[] stuff;
	
	std::cout << "---------------------------------------" << std::endl;
	
	const int NumOfFloatsToAllocate = 4;
	std::cout << "Sizeof 'float': " << sizeof(float) << std::endl;
	std::cout << "Allocating " << NumOfFloatsToAllocate
	          << " floats should take " << (NumOfFloatsToAllocate * sizeof(float))
	          << " bytes." << std::endl;
	
	float *floats = new float[ NumOfFloatsToAllocate ];
	
	std::cout << "Resulting address: " << floats << std::endl;
	
	delete[] floats;
	
	std::cout << "---------------------------------------" << std::endl;
	
	const int NumOfPodsToAllocate = 4;
	std::cout << "Sizeof 'Pod': " << sizeof(POD) << std::endl;
	std::cout << "Allocating " << NumOfPodsToAllocate
	          << " pods should take " << (NumOfPodsToAllocate * sizeof(POD))
	          << " bytes." << std::endl;
	
	POD *pods = new POD[ NumOfPodsToAllocate ];
	
	std::cout << "Resulting address: " << pods << std::endl;
	
	delete[] pods;
	
	return 0;
}


Results (with this compiler):
Sizeof 'Object': 8
Allocating 4 Objects should take 32 bytes.
The size 'new' is actually asking for: 36 bytes.
Original address: 0x9f54008
Resulting address: 0x9f5400c
---------------------------------------
Sizeof 'float': 4
Allocating 4 floats should take 16 bytes.
The size 'new' is actually asking for: 16 bytes.
Original address: 0x9f54030
Resulting address: 0x9f54030
---------------------------------------
Sizeof 'Pod': 16
Allocating 4 pods should take 64 bytes.
The size 'new' is actually asking for: 64 bytes.
Original address: 0x9f54048
Resulting address: 0x9f54048


So it's not just changing the pointer address, but it's also adding 4 bytes to the total size allocated. There's nothing you need to take into account.

I'm purely speculating here... but since (with this specific compiler) POD structs also don't require any extra bytes, I wonder if the 4 extra bytes for non-POD types is actually a pointer to the originally allocated type's destructor?

Imagine:


Base *objects = new Derived[10]; //All these are guaranteed to be the same type: Derived.
delete[] objects; //So they should all use the same destructor: Derived's destructor (if Base's destructor was virtual).

The compiler would know that 'objects' can be treated as type 'Base', but for destruction purposes might not realize that they are actually 'Derived', so maybe the first 4 bytes (sizeof a pointer-to-func on a 32 bit machine) are pointing at the correct destructor to call.

[/end-of-amature-speculation-from-someone-who-doesn't-know-assembly-or-the-inner-workings-of-compilers]

But regardless of what the compiler is using it for, it's taken care of for you, so there is nothing you need to manually take into account. Yes, it's allocating some extra bytes, but at the same time it's offsetting the pointer, and upon destruction it's destroying the correct number of bytes. There's no problem, unless you needed something to be guaranteed to be at a specific address in memory because of some esoteric hardware architecture - and in that really obscure, really unusual circumstance, then that's when you use malloc() directly.

A reasonable speculation, SotL, but you forget one critical detail: if the four bytes were indeed a destructor pointer, they would (A) need to be 8 bytes on 64-bit platforms and (B) obviate the need for virtual destructors when using arrays. which is definitely not the case.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Why don't they store the housekeeping data preceding the allocation then, making sure the returned pointer is aligned? Operator delete[] could find the housekeeping information based on the pointer passed to it, via the magic of subtraction.

Seems a no-brainer to me... (maybe the implementation was implemented before alignment became a major issue though, and it is retained for backwards compatibility).

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley


Why don't they store the housekeeping data preceding the allocation then, making sure the returned pointer is aligned? Operator delete[] could find the housekeeping information based on the pointer passed to it, via the magic of subtraction.

Because you don't know that the address there is writable.

The OS does though, unless you use a placement form, is that the reason then?

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

This topic is closed to new replies.

Advertisement