Getting Rid of "char*"

Started by
38 comments, last by Brain 9 years, 3 months ago

Let's pretend it is an issue. How do you think you would solve it?

This question is a very triggering, but at least familiar, proxy for a more fundamental issue. However, solving this problem will also solve the real issue I'm having. The real issue is that there is no simple and obvious way to encapsulate what are effectively statically allocated arrays, where the static size may change throughout the code.

So, if you want to allocate some static memory, who's size is determined at compile time, and have an interface to it through class methods, you must jump through hoops and avoid major caveats. A simple example of this is the humble string class.

Advertisement

I personally would use a stack-based allocator. That way I could use it with string, vector, pretty much any container within reason.

You might want something akin to a string_view: A light object which refers to a portion of a string, but does not own the storage. It might just be a wrapper around pair of char pointers, or a char pointer and a length. Most of the time a function can simply return a string_view (by value) instead of a string (by reference), and the string_view doesn't care how the storage is managed. The specific template instantiation of a string knows how to construct a string_view from its stack or heap-based storage, and that can be hidden inside the function, thereby hiding consumers of the function from any concern about the string's specific template type.

This works particularly well if you're treating strings as mostly immutable. Appending to a string, for example, would actually entail creating a new string of appropriate size, copying the contents of the first, and then added on the contents being appended. Otherwise, a string_view isn't going to provide the consumer a sufficient interface to perform an append operation, because knowledge of the string storage has been hidden. You could instead make your string_view hold a polymorphic reference to the original string instead of just a couple of char pointers, so that it could forward a request to append onto the string that actually knows about its own storage, but using polymorphism for that purpose seems suspect to me.

"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke

Let's pretend it is an issue. How do you think you would solve it?


Okay. If the data is statically allocated on the stack, what do you want to happen when you are returning this from a function? It to be copied to a newly stack allocated copy in the calling function?

You might want something akin to a string_view: A light object which refers to a portion of a string, but does not own the storage. It might just be a wrapper around pair of char pointers, or a char pointer and a length. Most of the time a function can simply return a string_view (by value) instead of a string (by reference), and the string_view doesn't care how the storage is managed. The specific template instantiation of a string knows how to construct a string_view from its stack or heap-based storage, and that can be hidden inside the function, thereby hiding consumers of the function from any concern about the string's specific template type.

This works particularly well if you're treating strings as mostly immutable. Appending to a string, for example, would actually entail creating a new string of appropriate size, copying the contents of the first, and then added on the contents being appended. Otherwise, a string_view isn't going to provide the consumer a sufficient interface to perform an append operation, because knowledge of the string storage has been hidden. You could instead make your string_view hold a polymorphic reference to the original string instead of just a couple of char pointers, so that it could forward a request to append onto the string that actually knows about its own storage, but using polymorphism for that purpose seems suspect to me.

I had considered something similar to this, although I will admit to not knowing that string_view was a thing. The idea being that you would make the memory elsewhere, and then attach a 'handler' class of some kind to it by passing in a pointer to the external memory during construction, and possibly the size. However, this just doesn't seem like the most elegant solution.

Having said that I have used this solution in multiple projects before, like attaching an 'editor' class to some memory, and then using that to interact with the memory. It is very nice and works quite well, and easily avoids the issues of instruction cache pollution. However, I'm hoping to find a different solution this time.

Okay. If the data is statically allocated on the stack, what do you want to happen when you are returning this from a function? It to be copied to a newly stack allocated copy in the calling function?

Generally, one would return a reference to MyString, but it depends on context.

I personally would use a stack-based allocator. That way I could use it with string, vector, pretty much any container within reason.

This seems to be the most reasonable thing to do. The only issue comes if you use many different static sizes you will run into more instruction cache misses. In the extreme it might actually slow things down! I was thinking perhaps you could export some of the code that is exactly the same across different allocators into a separate base-class and inherit from that. I'm not sure if that would help however, as I'm not super versed in how instructions are stored in memory.

So, in summery we have 3 different solutions I think:
1) Use char* and forget about c++ for this.

2) Use class that is constructed with a pointer to the externally allocated memory, and its size.

3) Use a static allocator.

The only problem with 3, which is my current favorite, is instruction cache pollution if you go overboard. This may be completely unavoidable with this solution, but I'm all ears if someone knows of a way to reduce this side-effect. Seeing as we might be crating a completely different set of instructions for each memory size, and that's not cool.

I think perhaps 2 could also be used very happily. It would effectively be like wrapping the string manipulation functions into a class, although not very OOP, it is probably the fastest and most problem free solution.

My point is that if the data is statically stack allocated in a function, when the function returns that data is lost.

So at that point you'd have to copy it elsewhere to access it outside the function which stack allocated it.
Maybe you should reconsider using char*. I think const char* is the best solution in many situations. Having a function that accepts a cost char* means you can almost always use that function without tricks; having a fn that wants std::string or whatever, means you have always to convert your data to a string object before you can call that fn. Say you load a file in memory, then you need to process it in some way. If you use plain char*, you can just use the file in memory as is. If you use strings, you have to convert a "natural" char* in a string before using it.
If you want to do some sort of "pre-allocated size" then you should use std::string.

Seriously. You're using a string. Use the string the standard library provides for you. If you're worried about allocations then pass the size you want into the constructor, or use the resize function.

If you want a character array, then use std::vector or std::array. Not surprisingly, std::array is template around the size you want the array to be because it is intended to be a replacement for the old type[size] array in C. So you're back to where you started.

Interfaces won't solve your problem because you have to return a pointer, which means you're going to be allocating your special string class on the heap ANYWAY.

In short - if I understand you correctly - you can't have what you're asking for. You cannot have an array allocated on the stack of a function and return that array because the array will be deallocated when the function exits (hello undefined behavior!). Between std::vector, std::array, and std::string you can do everything you're asking without rewriting a bunch of code.

Have you profiled your code to prove that std::string is too slow? Have you tried pre-allocating the string? Have you tried taking advantage of move semantics in C++11? (Assuming your compiler supports them) If your compiler is not C++11 compliant, have you tried passing your storage to your function as a parameter by reference?

My point is that if the data is statically stack allocated in a function, when the function returns that data is lost.

So at that point you'd have to copy it elsewhere to access it outside the function which stack allocated it.

While a bit of a misnomer, what I mean when I say stack allocated is simply that the array has a static size, and so would be easily stack allocatable. Being easily stack allocatable is important for cache coherency. Let me give you a quick example:

struct MyStruct {

int a;

char b;

float c[ 128 ];

};

V.S.

struct MyStruct {

int a;

char b;

float* c;

};

Where 'c' must now point to some other area of memory.

EDIT: I'm now realizing that I made a mistake and should have never called it 'stack allocated' as that's not quite what I meant. What I did mean is that it should be contiguous in memory with the object that created it, in other words, it must have fixed size.

If you want to do some sort of "pre-allocated size" then you should use std::string.

Seriously. You're using a string. Use the string the standard library provides for you. If you're worried about allocations then pass the size you want into the constructor, or use the resize function.

If you want a character array, then use std::vector or std::array. Not surprisingly, std::array is template around the size you want the array to be because it is intended to be a replacement for the old type[size] array in C. So you're back to where you started.

Interfaces won't solve your problem because you have to return a pointer, which means you're going to be allocating your special string class on the heap ANYWAY.

In short - if I understand you correctly - you can't have what you're asking for. You cannot have an array allocated on the stack of a function and return that array because the array will be deallocated when the function exits (hello undefined behavior!). Between std::vector, std::array, and std::string you can do everything you're asking without rewriting a bunch of code.

Have you profiled your code to prove that std::string is too slow? Have you tried pre-allocating the string? Have you tried taking advantage of move semantics in C++11? (Assuming your compiler supports them) If your compiler is not C++11 compliant, have you tried passing your storage to your function as a parameter by reference?

I'm sorry, I misspoke when I said stack allocated and it caused allot of rightful confusion. Also, as stated earlier, the whole string thing is a proxy problem, and so the reasoning behind rewriting it is undefined heh.

std::array is pretty much exactly the type of thing I am going to use, however, the current issue I'm having with this is simply that I am potentially creating a completely different set of instructions for each size of 'std::array' am I not?

This topic is closed to new replies.

Advertisement