• Advertisement
Sign in to follow this  

Handling string literals

This topic is 770 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In my own implementation of String, one of its constructors accepts a 'const char*' which is used to fill a buffer owned by the String object itself. However, if that 'const char*' is a literal, then it will live for the duration of the application (see below) , and we don't actually have to copy it into a buffer (assuming we don't ever try to write to it).

 

One thing that would be neat would be for the String's constructor to detect at compile time whether the given const char* is a literal or not. To test this, I wrote the following function:

// Constant version
template <std::size_t Size>
void String(const char (&string)[Size])
{
    // Do something
}

String("Test"); // Works, literal string
String(argv[0]); // Fails, non-literal string

Cool, but now if we add a second function:

// Dynamic version
void String(const char* string)
{
    // Do something else
}

It now selects the dynamic version both times. I'll admit I'm pretty perplexed as to why that happens, since I thought the array->pointer conversion was seen as less desirable than passing an array by reference. However, even if I do get this working, it raises a few issues: now strings need to know whether or not they're in charge of deallocation, which greatly increases complexity; It also gets tricky when dynamically linked libraries are involved, where the string literals may not in fact live for the duration of the application.

 

So I have two questions here:

1) How do I do this?

2) Should I be doing this?

 

Thanks

Share this post


Link to post
Share on other sites
Advertisement
I'd argue for issue 2, don't bother.

As you start to touch on, you'll have two styles: the one that points to a static read-only memory address and the one that can be modified and needs memory management. You don't want to violate the single responsibility principle, so you'll end up adding yet another layer for that behavior. Then you'll run into issues where for some reason the system doesn't detect i right and string literals are not detected so they are cloned.

Don't reinvent yet another string class.

Share this post


Link to post
Share on other sites

For interest's sake, I'd like to know if this is possible smile.png

 

But yeah, similar to Frob's sentiment, I made the choice to not even have a string class (or use std::string) in my engine whatsoever. Having a string class encourages people to use it, and you don't need to use it 99% of the time wink.png

The only place string manipulation is useful is in the GUI, which often requires a string-builder class, but not usually a basic string - the GUI also wouldn't often use string-literals.

Share this post


Link to post
Share on other sites

Don't reinvent yet another string class.

 

I knew someone would say that wink.png

 

Edit:

 

I've discovered how to do this, it's a bit ugly though.

#include <iostream>

namespace Implementation
{
	using Preferred = int;
	using Fallback = char;
	
	template <std::size_t Size>
	void String(Preferred, const char (&string)[Size])
	{
		std::cout << "Constant string" << std::endl;
	}
	
	void String(Fallback, const char* string)
	{
		std::cout << "Non-constant string" << std::endl;
	}
}

template <typename T>
void String(const T& string)
{
	Implementation::String(0, string); // By passing '0' (an integer), we're giving preference to the template version
        // Because the compiler prefers int->int over int->char
}

int main(int argc, char* argv[]) 
{
	String("Test"); // Detects constant string
	String(argv[0]); // Detects non-constant string
}

You can find a demo of it here.

I think I'm going to avoid this though. It's a neat trick, but there may be ways to fool it that I haven't thought of, plus the issues mentioned above.

Edited by Salty Boyscouts

Share this post


Link to post
Share on other sites
With very limited use, there is a stupid template trick (read about years ago but don't see immediately on Google) where you use pass a buffer to a template accepting a reference to a character array of a template-value size which identified if it was a fixed buffer followed by a second preprocesser double-replacement with the tokenizing operation #, where you replace x with #x and see if it gets wrapped in extra quotes. There are more details but I forget them, only remembering it was a trick if I ever need it.

The first part detects both string literal constants and static arrays but only when used directly so the compiler knows the type; it won't work on pointers to the arrays. The second one only works on direct string literals by tokenizing them and not pointers to string literals; if the literal does not appear exactly in that location it won't work. If both pass you have a string literal used directly, but an indirectly-passed string literal fails substitution and since SFINE rules apply a fallback is used that reports as not being an immediately-used string literal

Both are limited usefulness. It is a lot of work for something not typically needed and easily generates false negatives when mixed with pointers or when the compiler doesn't have the full underlying type immediately available.

Share this post


Link to post
Share on other sites
No.

char tempBuffer[255];
strcpy(tempBuffer, "This is not a literal anymore");
foo(tempBuffer);

Share this post


Link to post
Share on other sites
If I'm generally using a lot of strings and I want to avoid duplicates (which it kind of seems like you're going for?) I tend to reach for a string class that uses an internal string table for pooling.

It tends to be slightly more expensive when you add in a new string (because it has to copy the string data and potentially adjust the table) but duplicate strings only incur the cost of a hash and some comparisons. And then passing the strings around and "copying" them basically is as expensive as a reference-counted smart pointer.

Share this post


Link to post
Share on other sites

If you want to get into OS specific functionality, you could always query the OS for the address range your program is in, then test the pointer against that range.  Basically if the pointer points to your executable image, don't allocate and/or deallocate.

Share this post


Link to post
Share on other sites
Best to use a user-defined literal these days to ensure that only actual literals (and only e ones you intend) get converted specially to your string type.

Also, use a non-owning string_view type everywhere you can. You only need an owning string when you're storing the string "long term" which you do less often than you may think. Also, string_view helps you avoid excessive copying/allocation/ref-counting.

Share this post


Link to post
Share on other sites

Don't reinvent yet another string class.

Now, let him do it. Writing my own string class taught me a lot about C++ programming early on. Its a good exercise.

Share this post


Link to post
Share on other sites

I don't believe you can tell whether a string was given as a literal or as a variable, but it would be trivial if the following were legal C++.

constexpr bool is_literal_string(const char* p)
{
    return p[0] == p[0];
}

bool is_literal_string(const char*)
{
    return false;
}

If you have a string literal, the compiler is able to use the constexpr version, would pick it, and the function returns true. If you pass in a variable then it's unable to run the constexpr version, so it generates a call to the non-constexpr version, which always returns false. (And would likely be optimized away.)

 

The problem with the code above is that you can't overload function with an identical (but constexpr) version. I figure that would make function overload resolution even more complex than it already is.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement