Overwriting vector/string's internal buffers

Started by
16 comments, last by demonkoryu 18 years, 4 months ago
Hi, I am trying to avoid "redundant" allocations on the heap while converting strings from Unicode to ANSI and vice versa. Is it safe to directly write to const_cast<char*>( std::string.c_str() ) and &std::vector[0], after resize()ing them to the approbiate length? Or should I maybe use boost::pool_allocator, to reduce the heap allocation overhead? Or possibly alloca()? Another question, is there a method to construct a std::string from a new[] allocated array, in a way that std::string assumes ownership (and disposes of) the pointer? (I guess not)
Advertisement
It isn't safe or wise to manually mess with the memory of an object, and the std:: classes are no exception. If you directly write to that memory, you can ruin the object's state, and cause some very evil bugs.

If you need to control allocation for STL objects, you can use the template parameters to provide your own allocators. For instance, see this, and this for a more detailed treatment of how it works and how to implement allocators yourself.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thank you very much! This also clears up some other issues I had. I'm finally going to build my first allocator!

_inc_sat [ApochPiQ.rating]     ; No effect. [wow]

Quote:Original post by Konfusius
Is it safe to directly write to const_cast<char*>( std::string.c_str() )

Not really. The C++ Standard does not dictate how the characters in a std::string are stored. However, I don't know of any implementation that does not store them contiguously so if you're sure your implementation is not using copy-on-write and are careful with off-by-one errors then you can probably get away with it (using data instead of c_str may be a marginally better choice) but I can't recommend it.
Quote:&std::vector[0], after resize()ing them to the approbiate length?

Yes. Writing into a std::vector that is appropriately sized is perfectly safe and not uncommon. It must be resized though, not just reserved.

Quote:Or should I maybe use boost::pool_allocator, to reduce the heap allocation overhead?
Or possibly alloca()?

Try them. It should be trivial to provide a new typedef of std::basic_string and modify it to use the default allocator, boost::pool_allocator or a custom allocator based on alloca (although I suspect alloca will bite you hard with std::string trying to use memory that has been implicitly deallocated).

Quote:Another question, is there a method to construct a std::string from a new[] allocated array, in a way that std::string assumes ownership (and disposes of) the pointer? (I guess not)

You guess correctly.

Also, don't forget the iterator range constructors of std::string and std::vector, which can be an efficient way of construcing a std::string from a std::vector and vice-versa.

Enigma
Thank you very much for the response.

Quote:Original post by Enigma
Not really. The C++ Standard does not dictate how the characters in a std::string are stored. However, I don't know of any implementation that does not store them contiguously so if you're sure your implementation is not using copy-on-write and are careful with off-by-one errors then you can probably get away with it (using data instead of c_str may be a marginally better choice) but I can't recommend it.

Hmm, as I guessed. However I tried it with an ultra-checked version of STLport, and it seemed to work.
stlport::rope won't be in the game, anyway.

Quote:although I suspect alloca will bite you hard with std::string trying to use memory that has been implicitly deallocated.

I would use alloca() like this:
std::wstring& widen( const std::string& src, std::wstring& dst ){   const char*          source            = src.c_str();   const int            source_length     = static_cast<int> ( src.length() );   const size_t         char_count        = static_cast<size_t> ( ::MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, source, source_length, NULL, 0 ); )   const wchar_t*       buffer            = reinterpret_cast<wchar_t*>( _alloca( sizeof ( wchar_t ) * char_count ) );   ::MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, source, source_length, buffer, char_count );   return dst.assign( buffer, char_count );}

Do you see any problems with this?


Konfusius.Rate( Enigma, Konfusius.MaxRatingPower() );   //! \todo Doesnt' work. Use public variables or pointer hack?


[Edited by - Konfusius on December 5, 2005 12:31:37 PM]
Using _alloca() is generally not a good idea.

Here's a quick breakdown of how it works. There are, essentially, two types of memory you can access: the stack, and the heap. The stack works like a stack of plates; you can put one on the top, and take the top one off, but that's it. The stack is a solid block of memory; only the "current location" is accessible. The current location is recorded with the stack pointer. By contrast, the heap is pretty much a free-for-all.

The stack pointer is modified when you call a function. The parameters of the function are pushed onto the stack, then popped off to read them, and the return value is handled similarly (the exact way this works out depends on the calling convention used by the function in question). This means that, when a function exits, it will move the stack pointer back to (roughly) its original location. Local variables are allocated on the stack. This is how it might look:

void foo(){  int X, Y, Z;  // Stuff}[   ][ Z ] <- stack pointer during Foo()[ Y ][ X ] &lt;- stack pointer before/after Foo()<br>[ a ]<br><br>Stack: before Foo(), a and b are &#111;n the stack. Foo() adds X, Y, and Z.</pre><br><br>The compiler automatically moves the stack pointer to "create" space where the local variables are stored. <tt>_alloca</tt> basically lets you do the same thing, except you tell it how far to move the pointer. The reason that <tt>_alloca</tt> memory is automatically freed is that a function returning will reset the stack pointer to its old location. This means that, essentially, the stack slots used during Foo() are bogus &#111;nce the function exits.<br><br>The tricky thing here is that a stack location has a memory address just like a heap location; that's why it's possible to do things like take the address of a stack-allocated variable/object with the <tt>&</tt> operator. However, unlike the heap, a stack address will become invalid the instant the function that created the variable exits. This means that your STL code that uses <tt>_alloca</tt> will have a very high chance of referring to a bogus stack location, which <i>will</i> blow up your code.<br><br><br>Heap-allocation is much safer and more reliable. There's a good reason why <tt>_alloca</tt> is nonstandard [smile]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thanks for your detailed explanation (although I know how the stack works [smile]).
But won't this function work reliably?

  • alloca() allocates stack memory

  • MultiByteToWideChar() writes to this location

  • wstring( const wchar_t*, size_t ) reconstructs itself by copying from this location, so the memory should be valid

  • function exits, restores stack pointer

Yes, the alloca using code that you posted looks fine (assuming MultiByteToWideChar works in the obvious way and deals correctly with null-terminators - i.e. doesn't write one). I (and I assume ApochPiQ too) thought that since you mentioned alloca together with boost::pool_allocator you were considering writing an allocator based on alloca and using it with std::basic_string. That is where the problems we both mentioned could have occured.

Enigma
Quote:custom allocator based on alloca

Ahhh, I skipped over that.


Thank you all!
Heh, I can read, really [wink]

Yeah, I was interpreting that as using _alloca to build an allocator. Doing it locally to construct a wstring should be perfectly fine (it's really no different than doing the same with a statically sized wchar array, for instance).

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement