[C++] How do I allocate a multidimensional array with new?

Started by
51 comments, last by rip-off 13 years, 7 months ago
Quote:There's really no reason to put yourself through that in C++ (under normal circumstances at least). At the very least, use a vector or vector's; the end result will be basically equivalent, and the resulting code will be much cleaner (and most likely much safer as well).

Really, allocating multi-dimensional arrays manually like that in C++ is just onerous; it's really not something you want to be doing if you can possibly avoid it (and usually you can).


Well, like I said, using one unidimensional array is the most effective way of dealing with it in my opinion, but I'm not gonna lose sleep over an array of pointers. I can trust myself to delete the pointers because I always write the delete right after I'm done writing the block of code, so I never miss a delete..

Also, using vectors for fixed-size arrays is overkill in my opinion, not to mention it's extra overhead you might not need, especially for huge vectors, because the multiple resize calls you'll be doing to fill each vector with elements will pass through all the new items and initialize them to 0. (Or whatever you tell it to initialize to.) The performance loss is usually not worth worrying for, but in rare cases, it does, like when you use vectors as variable-size buffers for loading huge chunks of data. I had *significant* slowdowns to my game when I tried using that method to load the assets from my archive file into my game.
Advertisement
Quote:Original post by Bearhugger
I can trust myself to delete the pointers because I always write the delete right after I'm done writing the block of code, so I never miss a delete..


That's wrong.

Q: In your example, what do you think will happen if the marked line ("THIS ONE") ...
[...]pMultiArray = new char * [ X ];for( i = 0; i < X; i++ )    pMultiArray = <span class="cpp-keyword">new</span> <span class="cpp-keyword">char</span>[ Y ]; <span class="cpp-comment">// &lt;– THIS ONE</span><br><br><span class="cpp-comment">//</span><br><span class="cpp-comment">// Do your stuff…</span><br><span class="cpp-comment">//</span><br><br><span class="cpp-keyword">for</span>( i = <span class="cpp-number">0</span>; i &lt; X; i++ )<br>    <span class="cpp-keyword">delete</span> [] pMultiArray;<br><span class="cpp-keyword">delete</span> [] pMultiArray;<br><br><br></pre></div><!–ENDSCRIPT–><br><br>… will throw?<br><br><br><!–QUOTE–><BLOCKQUOTE><span class="smallfont">Quote:</span><table border=0 cellpadding=4 cellspacing=0 width="95%"><tr><td class=quote><!–/QUOTE–><!–STARTQUOTE–>Also, using vectors for fixed-size arrays is overkill in my opinion, <!–QUOTE–></td></tr></table></BLOCKQUOTE><!–/QUOTE–><!–ENDQUOTE–><br><br>0) Misconception of "bottleneck"<br>1) Misconception of modern compiler technology:<br><br><pre>#include &lt;vector&gt;<br>#include &lt;cstdlib&gt;<br>#include &lt;iostream&gt;<br><br>int main() {<br>        const int s = 128;<br><br>        std::vector&lt;float&gt; foo(s), bar(s);<br>        <br>        // init<br>        for (int i=0; i&lt;s; ++i) {<br>                foo<span style="font-weight:bold;"> = rand();<br>                bar<span style="font-weight:bold;"> = rand();<br>        }<br>        <br>        // add<br>        std::vector&lt;float&gt; frob(s);<br>        for (int i=0; i&lt;s; ++i)<br>                frob<span style="font-weight:bold;"> = foo<span style="font-weight:bold;"> + bar<span style="font-weight:bold;">; // dump below<br>        <br>        for (int i=0; i&lt;s; ++i)<br>                std::cout &lt;&lt; frob<span style="font-weight:bold;">;        <br>}</pre><br><br><pre>g++ -S -mfpmath=sse -march=native -O3 foo.cc</pre><br><br><pre>cat foo.s<br>&lt;snip&gt;<br>.L31:<br>	movups	(%r12,%rax), %xmm0<br>	movups	(%r13,%rax), %xmm1<br>	addps	%xmm1, %xmm0<br>	movaps	%xmm0, (%rbp,%rax)<br>	addq	$16, %rax<br>	cmpq	$512, %rax<br>	jne	.L31<br>&lt;snip&gt;</pre><br><br>–> Utilizes SSE code and properly unrolled the loop. Where is your overhead now?<br><br><!–EDIT–><span class=editedby><!–/EDIT–>[Edited by - phresnel on August 30, 2010 1:41:54 PM]<!–EDIT–></span><!–/EDIT–>
Quote:I can trust myself to delete the pointers because I always write the delete right after I'm done writing the block of code, so I never miss a delete..
Don't forget about exception safety.

In any case, why saddle yourself with even that minor task if you don't have to? If you don't like std::vector for some reason, you can write a simple RAII wrapper of your own that handles deallocation for you.
Quote:Also, using vectors for fixed-size arrays is overkill in my opinion
Who said anything about fixed-size arrays? The OP stated earlier that:
Quote:...Worldheight is variable.
So I think it's safe to assume we're dealing with variable-size arrays here.
Quote:not to mention it's extra overhead you might not need, especially for huge vectors, because the multiple resize calls you'll be doing to fill each vector with elements will pass through all the new items and initialize them to 0. (Or whatever you tell it to initialize to.) The performance loss is usually not worth worrying for, but in rare cases, it does, like when you use vectors as variable-size buffers for loading huge chunks of data. I had *significant* slowdowns to my game when I tried using that method to load the assets from my archive file into my game.
I'm not sure what method you were using, but not many people would recommend loading large amounts of data into a vector using push_back(), etc., at least not without using reserve(). Typically, by using reserve() or resize(), you can avoid repeated allocations and copies.

Also, with Visual Studio at least, an easy trap to fall into is measuring performance with iterator debugging turned on. In VS 2008 at least, iterator debugging is on even in the default release configuration, which I imagine has led quite a few people astray with respect to the standard library containers and algorithms. (Not saying that's the case here, but it does seem to happen fairly often.)

Anyway, I'm not saying that vector is the right solution in every case, but it does seem that often when people avoid it (and other parts of the standard library), it's for reasons that aren't particularly sound.
Please don't debug my code, that was a quick snippet to illustrate the idea of the multiple pointer method. In a real program I would initialize pointers to NULL, check for new returning NULL, and yada-yada. Of course, you can set new to throw bad_alloc on allocation failure if you prefer that behavior, and, of course, in that case, you have to catch it and do cleanup and stuff. Personally though, for reasons that are beyond the scope of this topic, I absolutely hate the exception programming system so I prefer doing it old-school style and check for return values instead.

Also, multiple push_back *is* slow when you have large amount of bytes to store. My RPG loads its assets and data from a custom archive format I designed, and, because the buffers need to have variable sizes, I decided to use a std::vector and call push_back to store the decompressed data bytes into the vector buffer. It was quite slow. Because I called reserve, I expected it to be super fast, and to be honest I have no idea why it was slow, but for whatever reason, it didn't work well and there was considerable slowdowns. I guess I could have used resize and write directly in the buffer using the [] operator (it would probably have fixed the problem) but I went straight to using new and delete because I knew this would work.

The overhead caused by vectors is that when you resize a vector (declaring a vector by passing it an initial size is the same thing), the STL does not simply allocate sufficient memory: in the case of a growing vector, it passes through all new items to initialize them. This is unlike the new operator, which just allocates the data while leaving the garbage inside. (Unless you are in debug mode.) In a nut shell: vector = allocation + initialization, New = allocation only. So, in the example posted above, the overhead is not at the allocation where you displayed the assembler, it's at the declaration.

Although it will not affect the TC if it's done correctly (more on that farther), another thing I recently discovered concerning vectors is that they can potentially waste memory. As you probably know, vectors are optimized to be resizable arrays whose length changes dynamically, and in order to prevent too much reallocation, the object is programmed to increase its size by 50% everytime it needs to grow. And that's great because it's a nice compromise between the speed of an array and the power of a linked list when you need an array of unknown size. However, as far as I know, the buffer never shrinks. What this means is that, if you have some huge chunk of data that you store to load data, let's say a 1MB buffer, then the vector will always have 1MB allocated even if your next loads are 300 bytes. Only the destructor seems to free that memory.

Don't believe me? See for yourself. The second cout is to illustrate the initialization process.
#include <vector>#include <iostream>using namespace std;int main(){    int i;    vector< int > v;    v.resize( 1000000 );    v.resize( 50 );    cout << "Capacity: " << v.capacity() << endl;    cout << "Index 23: " << v[ 23 ] << endl;    cin >> i;    return 0;}

STL vectors seem to be internally wired to be resizable. In the case of the TC question, if I understand correctly, the multi-dimensional array is sized once, when the map is loaded, then it remains constant until another map is loaded. So that is not something I'd use a vector for: I just need an array, and the only time it is resized is while loading while the screen is black. Still, if the TC wants to use the vector method, it is imperative when repopulating the vectors to call clear on the vector of vector instead of resize or it would lead in waste of memory after unloading a really large map.

Of course, I know that memory is cheap for PCs and that the performance losses are unnoticable 99% of the time. But I believe it's a bad habit to use vectors everywhere. Not saying I hate vectors, I use them a lot and I suggest people to use them as well. It's just that I suggest reserving those for arrays that are constantly changing in size.

In any case, this is getting off-topic. I just wanted to present a template without boost or STL.
Quote:Original post by Bearhugger
[...] I would initialize pointers to NULL, check for new returning NULL, and yada-yada. Of course, you can set new to throw bad_alloc on allocation failure if you prefer that behavior, and, of course, in that case, you have to catch it and do cleanup and stuff. Personally though, for reasons that are beyond the scope of this topic, I absolutely hate the exception programming system so I prefer doing it old-school style and check for return values instead.

Than C++ is not for you, because by the standard, new throws an exception upon allocation failure (<tt<new returning NULL is behaviour from before any standard existed (read MSVC 1998). In which case you will have quite a number of memory leaks in your posted snippet.

Quote:Also, multiple push_back *is* slow

That is the reason why I did <b<not push_back() in my code. That is also the reason why push_back() is not the only granted way to resize arrays. Please, look back at my code.


Quote:This is unlike the new operator, which just allocates the data while leaving the
garbage inside.

Please, oh please, take some more lessons on C++. new[] will call the default constructor of passed type, new will let you call any constructor. If you want garbage, use placement-new. C++0x will change that a bit, then you can call any constructor.

Quote:New = allocation only

Nope.

Quote:So, in the example posted above, the overhead is not at the allocation where you displayed the assembler, it's at the declaration.

No, I did only display assembly from the addition-code, not from allocation.


Quote:As you probably know, vectors are optimized to be resizable arrays whose length changes dynamically, and in order to prevent too much reallocation, the object is programmed to increase its size by 50% everytime it needs to grow

But where is there overhead because of that?

Quote:And that's great because it's a nice compromise between the speed of an array and the power of a linked list when you need an array of unknown size. However, as far as I know, the buffer never shrinks. What this means is that, if you have some huge chunk of data that you store to load data, let's say a 1MB buffer, then the vector will always have 1MB allocated even if your next loads are 300 bytes. Only the destructor seems to free that memory.

You can swap an empty vector with a used one. A common C++ trick.


edit: oops, striked through part was from an earlier revision of this reply

[Edited by - phresnel on August 31, 2010 1:51:05 PM]
Quote:Also, multiple push_back *is* slow when you have large amount of bytes to store.
Just to be clear, no one's disputing that adding lots of data to a vector via push_back() can be slow. But if you have to reallocate, you have to reallocate, and there's going to be cost involved whether you use vector or do it manually.
Quote:Because I called reserve, I expected it to be super fast, and to be honest I have no idea why it was slow, but for whatever reason, it didn't work well and there was considerable slowdowns. I guess I could have used resize and write directly in the buffer using the [] operator (it would probably have fixed the problem) but I went straight to using new and delete because I knew this would work.
Well, it sounds like you moved on from vector without ever really determining what was causing the slowdown. (Who knows - maybe you had iterator debugging turned on.)
Quote:As you probably know, vectors are optimized to be resizable arrays whose length changes dynamically, and in order to prevent too much reallocation, the object is programmed to increase its size by 50% everytime it needs to grow.
I believe the growth factor is implementation-defined, but yes, that's the general idea.
Quote:And that's great because it's a nice compromise between the speed of an array and the power of a linked list when you need an array of unknown size. However, as far as I know, the buffer never shrinks. What this means is that, if you have some huge chunk of data that you store to load data, let's say a 1MB buffer, then the vector will always have 1MB allocated even if your next loads are 300 bytes. Only the destructor seems to free that memory.
You can use the previously mentioned 'swap' trick if you want to free all of the memory associated with the vector.
Quote:Don't believe me? See for yourself.
The behavioral characteristics of std::vector are well known to C++ coders - I don't think anyone here needs to be convinced of how vector works :)
Quote:Of course, I know that memory is cheap for PCs and that the performance losses are unnoticable 99% of the time. But I believe it's a bad habit to use vectors everywhere. Not saying I hate vectors, I use them a lot and I suggest people to use them as well. It's just that I suggest reserving those for arrays that are constantly changing in size.
Well, I'd have to disagree with that. Even if all you have is a member array that's sized on construction and then the size never changes after that, using vector offers the following advantages:

1. No need to write an explicit destructor.
2. No need to worry about the 'Rule of Three'.

Those two things alone will simplify your code greatly wherever vector (or a similar standard container) is used, so even if the array isn't intended to change size, I'd still use vector unless you can show conclusively that doing so will incur an unacceptable run-time performance penalty.
Quote:Than C++ is not for you, because by the standard, new throws an exception upon allocation failure (<tt<new returning NULL is behaviour from before any standard existed (read MSVC 1998). In which case you will have quite a number of memory leaks in your posted snippet.

Nope, I have zero memory leaks. The new operator can still be configured to return NULL in the options of some compilers, and if your compiler doesn't do it, you can still force it to do that using this in your global header file:
#include <new>#define new new(std::nothrow).
C++ is for everyone. :)

Second... why do I care what the standards tell me to do? There are tons of features of C++0x that I don't like and will probably ignore. Besides, Java forced us to use throws statements, and now everyone agrees it's bad practice. People will code the way they see fit for their needs, use the features they find useful and forget what gives them more trouble then it's worth, and unless they start producing buggy software, I fail to see where the problem is.


Quote:Please, oh please, take some more lessons on C++. new[] will call the default constructor of passed type, new will let you call any constructor.

Allow me to respectfully return your first sentence to you. Calling
int * i = new int();
and
int * i = new int;
are NOT the same thing. The first line will allocate AND initialize the int to 0, while the second one only allocates without calling a constructor. Guess what? Allocating an array uses the second type, so I'm right: it's much less overhead because there's no initialization.


Quote:You can swap an empty vector with a used one. A common C++ trick.

I know that trick, but having to use that just to reset an vector is lame, if I may say. I find it much more straightforwards to just delete the damned array and reallocate it. The vector class needs a reset method in my opinion.


Quote:Just to be clear, no one's disputing that adding lots of data to a vector via push_back() can be slow. But if you have to reallocate, you have to reallocate, and there's going to be cost involved whether you use vector or do it manually.

And I'm not saying the opposite. If you have to constantly reallocate, then yes, use vectors, that's what they're for. It's the part where you end up with huge buffers even after a clear that annoys me. Yes, it can be swapped out if it really becomes an issue. My point is that vectors are less efficient memory-wise, and vector or not, you'd still have to manage it if it goes out of control. I'd rather just have a pointer to delete so I have full control.


Quote:Well, it sounds like you moved on from vector without ever really determining what was causing the slowdown. (Who knows - maybe you had iterator debugging turned on.)

Yeah that's the part where I wish I didn't overwrite it, but heh, who usually wants to backup faulty code?


Quote:Well, I'd have to disagree with that. Even if all you have is a member array that's sized on construction and then the size never changes after that, using vector offers the following advantages:

1. No need to write an explicit destructor.
2. No need to worry about the 'Rule of Three'.

Those two things alone will simplify your code greatly wherever vector (or a similar standard container) is used, so even if the array isn't intended to change size, I'd still use vector unless you can show conclusively that doing so will incur an unacceptable run-time performance penalty.

I don't buy the argument about saving a destructor? In a game, there's bound to be a destructor either way because resources like images need to be freed, what is it to add 2 lines to delete a pointer? If it's within the scope of a method, even better, you write the delete right after the new to make sure you don't forget, and then you insert the code in between. If you're the forgetful type and delete your data 50% of the time, then yeah, I agree, used automated objects such as STL containers. To be honest, I don't get why manual memory allocation is such a big deal. You only have one thing to remember and it's the delete, which you can write right away. You don't have to learn to use a complicated class, to include a header, to learn to use a complicated method, etc. Personally, I do forget stuff when I write code, but it's *never* the delete.

For performance, well as I said, the run-time performance penalty is the initialization. Whether you find the tradeoff worthy is up to you. Admittedly, it won't matter 99% of the time...
Quote:Original post by Bearhugger
Allow me to respectfully return your first sentence to you. Calling
int * i = new int();
and
int * i = new int;
are NOT the same thing. The first line will allocate AND initialize the int to 0, while the second one only allocates without calling a constructor. Guess what? Allocating an array uses the second type, so I'm right: it's much less overhead because there's no initialization.

The standard is very clear about what happens -- the first line value-initializes the object while the second line default-initializes the object. It just so happens that for scalar types, this is the same as zero-initialization and no initialization, respectively. However this isn't true once you have class types with constructors, in which case one such constructor is going to be called. If it's an array, then that initialization just happens on each element. Perhaps you were referring to scalar types exclusively with your example, but it was still worth noting.

Quote:I don't buy the argument about saving a destructor? In a game, there's bound to be a destructor either way because resources like images need to be freed, what is it to add 2 lines to delete a pointer? If it's within the scope of a method, even better, you write the delete right after the new to make sure you don't forget, and then you insert the code in between. If you're the forgetful type and delete your data 50% of the time, then yeah, I agree, used automated objects such as STL containers. To be honest, I don't get why manual memory allocation is such a big deal. You only have one thing to remember and it's the delete, which you can write right away. You don't have to learn to use a complicated class, to include a header, to learn to use a complicated method, etc. Personally, I do forget stuff when I write code, but it's *never* the delete.

But if your function has N points of return, you now have N deletes (and N copies of whatever other cleanup code there is) that need to be maintained. It's a lot easier to forget now, plus your code quickly gets messy. At the very least you should be utilizing std::unique_ptr to get the benefits of RAII, even if you don't use vectors. If the compiler can generates free code you'd otherwise have to write anyway, why not jump on the gravy train?
Quote:Original post by the_edd
Quote:Original post by rip-off
Something like that *. Look ma, I obey the rule of three without even trying


Actually, you need to either reorder the member variables or add a custom assignment operator as copy-assigning the vector can throw after the width_ has been copy-assigned, leaving the vector2d in an inconsistent state.

</ducks>

EDIT: fully elucidated

Good call. I actually had the members in the opposite order to begin with, I changed them to match the constructor initialiser list >_<
Quote:
...the run-time performance penalty is the initialization

I'm not convinced. The implementations of std::vector I've seen are so close to new[] delete[] that you can usually convince the compiler to emit pretty much identical code for both (for the code that counts). In any case, the OP is allocating hash_sets, so using new[] will call the constructor anyway. There is no performance benefit in choosing one or the other here. The best thing about having it wrapped in a class is that the OP can use the simple vector backed implementation until profiling provides a reason to suspect that this code is a bottleneck.

Quote:
I know that trick, but having to use that just to reset an vector is lame, if I may say.

It certainly is lame, I agree. But the typical thing with the standard library is to provide as small an interface as possible: if it can be done with the interface then don't add a function for it. Wrapping it in a small template function sooths most of that pain, and you can comment it with curses on the standard committee if that helps too.

Quote:
I don't get why manual memory allocation is such a big deal.

Because its unnecessary boilerplate 97% of the time. You're writing some reasonably high level code, and its littered with boilerplate code asking the compiler to do really simple stuff, stuff that the computer should be worrying about, not you. Plus, one of the nice things about using RAII consistently for all memory management is that its easier to change existing code. Adding a new return point to a function no longer requires analysis to ensure all control paths clean up after themselves. There are benefits when interacting with exceptions, but these don't apply to you.

For me, while I appreciate that I can get as close as I need for those < 3% of times where it is necessary, I couldn't imagine the constant intellectual overhead of worrying about manual memory management where it doesn't matter. Its bad enough trying to get the right combination of value, shared_ptr, weak_ptr, unique_ptr etc to get the ownership semantics right that the last thing I want to do is be forced to write destructors and think about copying semantics... the compiler can and should deduce these things from the semantics of the RAII objects I'm using.

But I don't have any problem if you enjoy wasting your time with such trivialities. But on a public forum I always argue for what I think is the objectively superior style.
Quote:I don't buy the argument about saving a destructor? In a game, there's bound to be a destructor either way because resources like images need to be freed, what is it to add 2 lines to delete a pointer?
But that's the thing - with proper use of RAII, resources such as images can clean up after themselves as well, which means no (explicit) destructor. (Unrelated, but why does it take two lines to delete a pointer?)

Also, I question your statement about explicit destructors being inevitable in a games context. This is only anecdotal of course, but in the C++-based project I'm working on right now, I think I'd be hard pressed to find an explicit destructor anywhere in the code base. (There may be a couple here and there, but they're exceedingly rare.)
Quote:To be honest, I don't get why manual memory allocation is such a big deal. You only have one thing to remember and it's the delete, which you can write right away. You don't have to learn to use a complicated class, to include a header, to learn to use a complicated method, etc. Personally, I do forget stuff when I write code, but it's *never* the delete.
Well, obviously it's not *that* big a deal - people got along fine before RAII, and plenty of good games have been written in C or 'C with classes', with malloc/free and new/delete all over the place.

Personally though, I find more and more that I like it when programming is easy. It's not because of lack of skill, but rather because it allows me to think about problems in ways that are more abstract and more interesting. RAII in C++ makes programming easier, IMO, as does garbage collection in GC'd languages. (To be honest, I love coding in C# and other GC'd languages because it feels like I'm just 'solving the problem', rather than mucking around with tedious details that aren't directly relevant.)

I realize it's personal preference and no one's going to be changing anyone's mind here, but I do find it hard to understand why people will so often eschew RAII in C++, even if they don't have a solid reason for doing so that's backed up by hard data. (I'm not referring to you here necessarily, but this does seem to happen a lot.)

I figure if you know you're a competent coder, there's no need to 'prove it' repeatedly but doing all the heavy lifting yourself. Software development is hard enough as it is, so why make it any harder than it needs to be?

This topic is closed to new replies.

Advertisement