Temporary string literals

Started by
20 comments, last by Alberth 5 years, 6 months ago

Consider following code:


std::string function()
{
  std::string test("This is some very long string...");
  
  // ... use test
  
  return test;
}

I know that string literals in pointers (char* psz = "some string";) are stored in .rodata of an exe file and that string literals defined in an array are stored locally or wherever the array is stored, on stack if array is on stack or in object heap if object is stored on heap. This can be seen here.

What about the code above. For long strings std::string stores chars on the heap. But before they get to the heap, is the above string "This is some very..." stored somewhere? Just by intuition I would say no. If it would be stored somewhere, let's say .rodata, it would just clutter unnecessarily the exe file so it doesn't make a lot sense. Am I right?
I know that it might be implementation defined but I'm asking only for x86 system.

Advertisement

Of course the string has to be stored somewhere in your executable file, likely in a readonly data section, as you conjectured. The heap is a purely runtime thing and will only be populated during execution of your program.

That being said, depending on what you're doing with the string and your optimization options, it's possible that the string doesn't actually end up on the heap (e.g. all functions you call on the string are getting inlined and you only do read-only accesses).

2 hours ago, rnlf_in_space said:

Of course the string has to be stored somewhere in your executable file, likely in a readonly data section, as you conjectured. The heap is a purely runtime thing and will only be populated during execution of your program.

Ok, so if I make something like this:


void function()
{
  std::string test1("Some very long test string 1 ...");
  std::string test2("Some very long test string 2 ...");
  std::string test3("Some very long test string 3 ...");
  // ... more string
  std::string test10000("Some very long test string 10000 ...");
}

Then I will end up with an exe file "bloated" up with 10000 long strings that are not used anywhere besides in one function. It looks to me like an overkill, it must be something that takes care of this?

If the strings get used (and not optimized out), of course they need to be stored somewhere. Where else would they be stored if not in the executable? You could store them in a data file, but then you have to load them at runtime.

I don't know how to clarify that any better... if the strings are used OF COURSE they need to be stored somewhere, how else could you use them? Maybe an analogy: If you move to a different apartment and you want to take your books with you, you have to store them in a box. Are the boxes "bloated" just because you want to take your 10000 books with you? Your new apartment is the running process, the boxes are the executable file, the books are the strings.

If you don't want the bloat, find a way to code your program without using the strings.

1 hour ago, rnlf_in_space said:

If the strings get used (and not optimized out), of course they need to be stored somewhere. Where else would they be stored if not in the executable? You could store them in a data file, but then you have to load them at runtime.

It makes sense. One other thing that comes to my mind:


void function()
{
  char text1[] = "Some very long string 1 ...";
  char text2[] = "Some very long string 2 ...";
  // ... more char arrays
  char text10000[] = "Some very long string 10000 ...";
  
  std::string test1(text1);
  std::string test2(text2);
  // ... more string
  std::string test10000(text10000);
}

This way we would not increase the size of .rodata but instead we would store the chars directly on a stack and then on heap when we assign them to std::string.
Although this solution would increase .text segment (code segment) as this is where the chars would be stored now. So I guess in the end we can't skip increasing the exe size.

No, it's the same, really. Going back to the moving apartments analogy, if you want to sort the books into your shelves (the stack in your code) in the new apartment, you first have to get them there somehow.

.text and .rodata both end up in the exe (and again, this would put the strings into the read only data section, not the code section).

The only way you get something onto the stack or heap is to put it there using code (that the compiler generates for you, you don't see it directly in C++). And that generated code needs to copy it from somewhere. This "somewhere" is you exe file. I cannot ask you to hand me a certain book from your bookshelves if you didn't bring that book from your old apartment. And that means you need to have had that book in one of the boxes while moving.

The exe file must contain all the data required to execute your program (unless you load the data from a different file, but then again you need to ship that file and it doesn't make a big difference in the overall size of things). If your program contains code that relies on a certain string being present, then that string must somehow get to the place where you execute the program.

1 hour ago, rnlf_in_space said:

.text and .rodata both end up in the exe (and again, this would put the strings into the read only data section, not the code section).

I would have to disagree on this one. If you take a look at this link from beginning of the topic, it says this:

Quote

If we do the same for char[]:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp), and we can of course modify it.

So the chars "abc" end up directly in .text segment as I guess the above "char text1[] = "Some very...";" and other arrays.

This is because $0x636261 is not an address, it's directly a map of chars that get stored with other code in .text segment.

So it's still stored inside the exe, just in this case as part of a movl instruction. That works for short strings (8 characters incl. terminator at most). This is the piece of code that copies the string onto the stack!

Yes, 0x00636261 is a 32 bit value that is equal to the string "ABC\0". So yes, for short strings it's part of the code itself. But that doesn't work with longer strings. When you have a longer string, it will copy it from where it is stored (.rodata or .text doesn't really matter) onto the stack first. See it here: https://gcc.godbolt.org/z/KBHEBm

The .LCn: are the labels where the compiler puts the string data, all those movdqa xmm0 ... movaps [rsp] are the unrolled memcpy, where the compiler added the code to move the string from the (in this case) .text section onto the stack. The first line, sub rsp, 104 is where the compiler makes space for the string on the stack.

The stack is completely empty when the program starts running. The only way to get something onto the stack is during runtime using code! The same goes for the heap. You cannot store something on the stack or heap directly so that it's just there when your program starts running.

@ryt I think you're hunting a non-existing objective. There is no universal "string behavior for x86 systems". It depends on the compiler (and compiler version) and library (library version) at least, and likely on other things as well.

If you want to enforce some behavior, build your own string class. Otherwise, just assume the string designers were sane people and made wise choices, and spend your time on more useful things than text storage inside string objects.

@Alberth, I think what they're really trying to do is to have constant string data in a program without having it in any file they're shipping. I'm a bit lost as to how to explain that that's not possible.

This topic is closed to new replies.

Advertisement