Sign in to follow this  
demonkoryu

Overwriting vector/string's internal buffers

Recommended Posts

Hi, I am trying to avoid "redundant" allocations on the heap while converting strings from Unicode to ANSI and vice versa. Is it safe to directly write to const_cast<char*>( std::string.c_str() ) and &std::vector[0], after resize()ing them to the approbiate length? Or should I maybe use boost::pool_allocator, to reduce the heap allocation overhead? Or possibly alloca()? Another question, is there a method to construct a std::string from a new[] allocated array, in a way that std::string assumes ownership (and disposes of) the pointer? (I guess not)

Share this post


Link to post
Share on other sites
It isn't safe or wise to manually mess with the memory of an object, and the std:: classes are no exception. If you directly write to that memory, you can ruin the object's state, and cause some very evil bugs.

If you need to control allocation for STL objects, you can use the template parameters to provide your own allocators. For instance, see this, and this for a more detailed treatment of how it works and how to implement allocators yourself.

Share this post


Link to post
Share on other sites
Thank you very much! This also clears up some other issues I had. I'm finally going to build my first allocator!

_inc_sat [ApochPiQ.rating]     ; No effect. [wow]

Share this post


Link to post
Share on other sites
Quote:
Original post by Konfusius
Is it safe to directly write to const_cast<char*>( std::string.c_str() )

Not really. The C++ Standard does not dictate how the characters in a std::string are stored. However, I don't know of any implementation that does not store them contiguously so if you're sure your implementation is not using copy-on-write and are careful with off-by-one errors then you can probably get away with it (using data instead of c_str may be a marginally better choice) but I can't recommend it.
Quote:
&std::vector[0], after resize()ing them to the approbiate length?

Yes. Writing into a std::vector that is appropriately sized is perfectly safe and not uncommon. It must be resized though, not just reserved.

Quote:
Or should I maybe use boost::pool_allocator, to reduce the heap allocation overhead?
Or possibly alloca()?

Try them. It should be trivial to provide a new typedef of std::basic_string and modify it to use the default allocator, boost::pool_allocator or a custom allocator based on alloca (although I suspect alloca will bite you hard with std::string trying to use memory that has been implicitly deallocated).

Quote:
Another question, is there a method to construct a std::string from a new[] allocated array, in a way that std::string assumes ownership (and disposes of) the pointer? (I guess not)

You guess correctly.

Also, don't forget the iterator range constructors of std::string and std::vector, which can be an efficient way of construcing a std::string from a std::vector and vice-versa.

Enigma

Share this post


Link to post
Share on other sites
Thank you very much for the response.

Quote:
Original post by Enigma
Not really. The C++ Standard does not dictate how the characters in a std::string are stored. However, I don't know of any implementation that does not store them contiguously so if you're sure your implementation is not using copy-on-write and are careful with off-by-one errors then you can probably get away with it (using data instead of c_str may be a marginally better choice) but I can't recommend it.

Hmm, as I guessed. However I tried it with an ultra-checked version of STLport, and it seemed to work.
stlport::rope won't be in the game, anyway.

Quote:
although I suspect alloca will bite you hard with std::string trying to use memory that has been implicitly deallocated.

I would use alloca() like this:

std::wstring& widen( const std::string& src, std::wstring& dst )
{
const char* source = src.c_str();
const int source_length = static_cast<int> ( src.length() );
const size_t char_count = static_cast<size_t> ( ::MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, source, source_length, NULL, 0 ); )
const wchar_t* buffer = reinterpret_cast<wchar_t*>( _alloca( sizeof ( wchar_t ) * char_count ) );

::MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, source, source_length, buffer, char_count );
return dst.assign( buffer, char_count );
}











Do you see any problems with this?



Konfusius.Rate( Enigma, Konfusius.MaxRatingPower() ); //! \todo Doesnt' work. Use public variables or pointer hack?


[Edited by - Konfusius on December 5, 2005 12:31:37 PM]

Share this post


Link to post
Share on other sites
Using _alloca() is generally not a good idea.

Here's a quick breakdown of how it works. There are, essentially, two types of memory you can access: the stack, and the heap. The stack works like a stack of plates; you can put one on the top, and take the top one off, but that's it. The stack is a solid block of memory; only the "current location" is accessible. The current location is recorded with the stack pointer. By contrast, the heap is pretty much a free-for-all.

The stack pointer is modified when you call a function. The parameters of the function are pushed onto the stack, then popped off to read them, and the return value is handled similarly (the exact way this works out depends on the calling convention used by the function in question). This means that, when a function exits, it will move the stack pointer back to (roughly) its original location. Local variables are allocated on the stack. This is how it might look:

void foo()
{
int X, Y, Z;
// Stuff
}

[ ]
[ Z ] <- stack pointer during Foo()
[ Y ]
[ X ]
[ b ] <- stack pointer before/after Foo()
[ a ]

Stack: before Foo(), a and b are on the stack. Foo() adds X, Y, and Z.


The compiler automatically moves the stack pointer to "create" space where the local variables are stored. _alloca basically lets you do the same thing, except you tell it how far to move the pointer. The reason that _alloca memory is automatically freed is that a function returning will reset the stack pointer to its old location. This means that, essentially, the stack slots used during Foo() are bogus once the function exits.

The tricky thing here is that a stack location has a memory address just like a heap location; that's why it's possible to do things like take the address of a stack-allocated variable/object with the & operator. However, unlike the heap, a stack address will become invalid the instant the function that created the variable exits. This means that your STL code that uses _alloca will have a very high chance of referring to a bogus stack location, which will blow up your code.


Heap-allocation is much safer and more reliable. There's a good reason why _alloca is nonstandard [smile]

Share this post


Link to post
Share on other sites
Thanks for your detailed explanation (although I know how the stack works [smile]).
But won't this function work reliably?

  • alloca() allocates stack memory

  • MultiByteToWideChar() writes to this location

  • wstring( const wchar_t*, size_t ) reconstructs itself by copying from this location, so the memory should be valid

  • function exits, restores stack pointer

Share this post


Link to post
Share on other sites
Yes, the alloca using code that you posted looks fine (assuming MultiByteToWideChar works in the obvious way and deals correctly with null-terminators - i.e. doesn't write one). I (and I assume ApochPiQ too) thought that since you mentioned alloca together with boost::pool_allocator you were considering writing an allocator based on alloca and using it with std::basic_string. That is where the problems we both mentioned could have occured.

Enigma

Share this post


Link to post
Share on other sites
Heh, I can read, really [wink]

Yeah, I was interpreting that as using _alloca to build an allocator. Doing it locally to construct a wstring should be perfectly fine (it's really no different than doing the same with a statically sized wchar array, for instance).

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Heh, I can read, really [wink]

This I fail to comprehend. [crying]

Quote:
Original post by ApochPiQ
Doing it locally to construct a wstring should be perfectly fine (it's really no different than doing the same with a statically sized wchar array, for instance).


I wonder why C++ doesn't support the C feature of dynamically allocating arrays on the stack. Is this a C99-only feature, or will it be integrated into C++ some time in the future?

Share this post


Link to post
Share on other sites
Quote:
Original post by Konfusius
Is this a C99-only feature, or will it be integrated into C++ some time in the future?


Yes variable length arrays (VLAs) are C99 only, and yes the C++ standards committee are considering incorporating features of C99 for the next standard revision. That doesn't mean VLAs (among other features) will definiately be in this time round though but eventually they most likely consume them all since both C and C++ standards committees want to do something about joining these seperate standards into one coherent standard or work together more closely or something similar to that effect that prevents them from diverging any more apart.

Note on GCC since it supports most of C99 those extra features are available as language extensions to a C++ program and even provides a custom allocator type that uses VLAs (called array_allocator).

Share this post


Link to post
Share on other sites
Thanks for clearing that up.

Now I have another problem... sort of.
I just whipped up this code:

template<typename T>
class scoped_buffer
{
private:
T* ptr;
size_t len;

scoped_buffer();
scoped_buffer( const scoped_buffer& );
scoped_buffer& operator=( const scoped_buffer& );

public:
explicit scoped_buffer( size_t length )
: len( length )
, ptr( reinterpret_cast<T*>( _alloca( sizeof ( T ) * length ) ) )
{};
~scoped_buffer()
{};
operator T*() const
{ return ptr; };
T& operator[]( std::ptrdiff_t pos ) const
{ return ptr[pos]; };
};




and then spent some time debugging it (crashing all the time), until I realized that a constructor is a function and therefore the _alloca() allocated memory is already gone when scoped_buffer is constructed. Dang.

Any idea on how to make this work?

Share this post


Link to post
Share on other sites
Hmm, I'm wondering why the documented default template parameters of auto_buffer aren't defined?

I had to edit auto_buffer.hpp from

template< ss_typename_param_k T
, ss_typename_param_k A
, ss_size_t SPACE = 256
>
class auto_buffer

to

template< ss_typename_param_k T
, ss_typename_param_k A = ss_typename_param_k std::allocator<T>
, ss_size_t SPACE = 256
>
class auto_buffer


I tried
ss_typename_param_k A = ss_typename_param_k allocator_selector<T>::allocator_type
, but allocator_selector.hpp isn't distributed with stlsoft...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this