Sign in to follow this  

an efficient c++ string

This topic is 4396 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I am looking for an efficient c++ string-like object. As I don't know in advance how big my string will be, I need it to be autogrowable but *without reallocation*. It should also be fast when accessing at any position, inserting at the beginning (which would be rare) and inserting at the end (most of the case). I guess it would be made of linked blocks. A plus would be its ability to (also) return iovecs (to avoid useless copies). I thought ropes could be a nice candidate, but it looks a bit complex and I am not sure I can extract iovecs from its internal structure. Thanks you

Share this post


Link to post
Share on other sites
Quote:
Original post by fabien007
It should also be fast when accessing at any position, inserting at the beginning (which would be rare) and inserting at the end (most of the case).

With these requirements, you might be best off using a std::vector< char > or simply std::string.

Quote:
I guess it would be made of linked blocks. A plus would be its ability to (also) return iovecs (to avoid useless copies).

std::vector<...> stores data contiguously, and has a size() member. std::string's data() member returns a pointer to the string data, and also has a size() member.

Linked data structures are generally not good for fast random access, and if most of your insertions occur at the end, std::vector< char > or std::string should be more than efficient, although I have no idea what your exact needs are.

Share this post


Link to post
Share on other sites
Quote:
Original post by fabien007
It should also be fast when accessing at any position, inserting at the beginning (which would be rare) and inserting at the end (most of the case).
If only need to insert at the beginning and the end then it's not too hard to write a 'bidirectional' deque container.
It should be fairly easy to write considering that you only need to handle characters, i.e. no tricky placement news and exception safety.
Quote:
Original post by stylin
Quote:
Original post by fabien007
It should also be fast when accessing at any position, inserting at the beginning (which would be rare) and inserting at the end (most of the case).

With these requirements, you might be best off using a std::vector< char > or simply std::string.
The OP wanted a reallocation free version.

Does the code need to be portable to machines without virtual memory?
One option might be to simply reserve a large contiguous portion of the address space and commit pages as needed. A better solution would be to allocate everything in native pages and reallocate by shuffling them around with the MMU.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yes the big point is to avoid reallocation. Concerning the portability to non MMU systems, the code is supposed to be as much platform dependent as possible. Potential target systems for this part would be WinNT, Linux 2.6 on x86, HP-UX, Irix, SunOS and even TPF (for the enthusiasts ;)).
So even if I think doynax your idea is excellent, I'm not sure implementing such "on-the-fly memory schuffling" is very transparent, and when it comes to HP or Irix this would really be playing Russian roulette.

So do we agree the 'bidirectional' deque container (the std::deque) would be the best path to follow?

The only drawback I see in this solution would be the difficulty (if not the impossibility) to provide a (universal) iovec interface. The point is to avoid useless copies, ie copying the contents of the deque container into a contiguous memory zone.

Share this post


Link to post
Share on other sites
You *cannot* avoid reallocation and also have contiguous storage - the initial block simply can't be guaranteed to fall in a region of memory where the space after it will necessarily be available later; otherwise you have effectively already reserved that space. But I don't see why you would need to avoid reallocation in a context where plain allocation (and insertion into a linked list of blocks) is OK.

It sounds though like contiguous storage may not be required for you. If std::string really doesn't do it for you, look for a rope implementation. AFAIK ropes normally are implemented with "linked blocks" of some sort.

And what is an "iovec"?

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
You *cannot* avoid reallocation and also have contiguous storage - the initial block simply can't be guaranteed to fall in a region of memory where the space after it will necessarily be available later; otherwise you have effectively already reserved that space. But I don't see why you would need to avoid reallocation in a context where plain allocation (and insertion into a linked list of blocks) is OK.
Of course you can, if you've got virtual memory. It's just one of many seemingly impossible MMU tricks (copy-on-write, memory mapped files, circular buffers, etc..).
Quote:
Original post by Zahlman
It sounds though like contiguous storage may not be required for you. If std::string really doesn't do it for you, look for a rope implementation. AFAIK ropes normally are implemented with "linked blocks" of some sort.
I've only read about SGI's rope implementation, and never used it myself. It did appear to have an awful lot of overhead however, so if performance is critical there may be faster (portable) solutions available. Such as a custom bidirectional deque.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
You *cannot* avoid reallocation and also have contiguous storage


I would like a contiguous storage only when required. Internally linked list or any kind of stucture glueing together contiguous parts of memory would be used, and a toString or c_str method would be called to copy the scattered parts into one final contiguous storage.
This temporary method would be used until applications would access the different part of memory via iovecs (thus without any copy).

Quote:
Original post by Zahlman
And what is an "iovec"?

An iovec is a simple low level structure, basically:

typedef struct _iovec {
unsigned long iov_len;
char* iov_base;
} iovec;

Then it can be implemented in arrays, linked lists or whatever you like. It can be seen as the representation of a single contiguous memory zone.


Quote:
Original post by doynax
Of course you can, if you've got virtual memory. It's just one of many seemingly impossible MMU tricks (copy-on-write, memory mapped files, circular buffers, etc..).

Do you know any good tutorial to learn those "MMU tricks"?

Share this post


Link to post
Share on other sites

This topic is 4396 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this