Jump to content
  • Advertisement
Sign in to follow this  
Deyja

Escape sequences in string literals

This topic is 4774 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In C++, the result of (std::string("\xFA") == std::string("·")) is true. Somewhere along the way, the \xFA is converted to the actual character ·. I'm not sure wether the compiler or the preprocessor does this, but I suspect it is the preprocessor. The preprocessor would have to expand such an escape for character literals, it makes sense that it would handle them in strings too. Now then, heres the rub: I have to explicitly disable the expansion of other escape sequences, such as \n and \". These ones the preprocessor must simply ignore, so that angelscript can still parse the string literal. What I need is a complete list of all the escape sequences angelscript already parses. Some of them I don't have to worry about - \t, for example, should work fine in AS as either \t or the actual ascii code for tab.

Share this post


Link to post
Share on other sites
Advertisement
Character literals are handled by the preprocessor, and escape sequences work inside them.

Share this post


Link to post
Share on other sites
My guess is that escape sequences in strings are converted by the C++ compiler, not the preprocessor. This is because the C++ compilers normally only accepts ASCII characters, i.e. values below 128 (at least I should think so). If the preprocessor did the conversion it would be possible to insert illegal characters in the strings that the C++ compiler wouldn't accept.

The preprocessor on the other hand takes care of character literals, as they are converted to a numeric constant.

In AngelScript the following escape sequences are currently handles (as can be seen in the script manual [wink]):


sequence value description
\0 0 null character
\\ 92 back-slash
\" 34 double quotation mark
\n 10 new line feed
\r 13 carriage return
\xFF 0xFF FF should be exchanged for the hexadecimal number representing the byte value wanted


FIY: The code that handles escape sequences in strings is this:


int asCBuilder::RegisterConstantString(const char *cstr, int len)
{
asCArray<char> str;
str.Allocate(len, false);

for( int n = 0; n < len; n++ )
{
if( cstr[n] == '\\' )
{
++n;
if( n == len ) return -1;

if( cstr[n] == '"' )
str.PushLast('"');
else if( cstr[n] == 'n' )
str.PushLast('\n');
else if( cstr[n] == 'r' )
str.PushLast('\r');
else if( cstr[n] == '0' )
str.PushLast('\0');
else if( cstr[n] == '\\' )
str.PushLast('\\');
else if( cstr[n] == 'x' || cstr[n] == 'X' )
{
++n;
if( n == len ) break;

int val = 0;
if( cstr[n] >= '0' && cstr[n] <= '9' )
val = cstr[n] - '0';
else if( cstr[n] >= 'a' && cstr[n] <= 'f' )
val = cstr[n] - 'a' + 10;
else if( cstr[n] >= 'A' && cstr[n] <= 'F' )
val = cstr[n] - 'A' + 10;
else
continue;

++n;
if( n == len )
{
str.PushLast((char)val);
break;
}

if( cstr[n] >= '0' && cstr[n] <= '9' )
val = val*16 + cstr[n] - '0';
else if( cstr[n] >= 'a' && cstr[n] <= 'f' )
val = val*16 + cstr[n] - 'a' + 10;
else if( cstr[n] >= 'A' && cstr[n] <= 'F' )
val = val*16 + cstr[n] - 'A' + 10;
else
{
str.PushLast((char)val);
continue;
}

str.PushLast((char)val);
}
else
continue;
}
else
str.PushLast(cstr[n]);
}

return module->AddConstantString(str.AddressOf(), str.GetLength());
}



In my opinion the preprocessor shouldn't try to convert any escape sequences inside string constants. Otherwise you'd have to disable conversion of escape sequences such as \x0A (= \n), \x22 (= \"), and \x5C (= \\) as well.

Share this post


Link to post
Share on other sites
Okay. I should have tested with the latest version. In 1.10.x it seems that \x## isn't being converted. I was going to suggest that it was. As of right now, the only escape the preprocessor will handle inside string literals is \", and only so it doesn't think that quote terminates the string. Character literals will support all those escapes. Additionally, they support 'literal escapes' which Angelscript might not. Where as escapes such as \n must be convered to an entirely different ascii code, ones such as \" do not. \" is handled by default code that simply removes the slash. Therefore, you could have a character literal such as '\y', and it's value would be 'y'. I also support \t, as it is usually more recognizable in code than the actual tab character, which is different sizes in different editors and may only be a single space depending on where it is in the line. I'm begining to carry on; like C++, multi-character character literals will compile, but only the first character is used. 'youallsuck' would equal 'y'.

Just for comparison.

static char* parseEscapeSequence(char* start, char* end, Lexem& out)
{
if (start == end) return start;
if (*start != '\\') return start; //Why was this called?
++start;
if (start == end) return start;
if (out.type == STRING) //Ignore the escape sequence!
{ //Don't need to worry about hex-escapes.
out.value += '\\';
out.value += *start;
return ++start;
}
else //must be a character literal.
{
//Non-literal escapes
if (*start == 'n') { out.value += '\n'; return ++start; }
if (*start == 't') { out.value += '\t'; return ++start; }
if (*start == 'r') { out.value += '\r'; return ++start; }
if (*start == '0')
{
out.value += '\0';
//out.value.resize(out.value.size()+1);
//out.value[out.value.size()-1] = '\0';
return ++start;
}
if (*start == 'x') return parseHexEscape(start,end,out);

//Literal escape - Just get rid of the slash.
out.value += *start;
return ++start;
}
}

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!