Jump to content
  • Advertisement
Sign in to follow this  
Ectara

Global and read only variable storage

This topic is 2642 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Can anyone describe how global data and read-only data are stored in an executable, and how they are stored and accessed in memory? I'm currently implementing it in my VM, but I feel as if I am going about this the wrong way.

Share this post


Link to post
Share on other sites
Advertisement
Depends on OS.

A simple generic version is something like this:
- exe is based on some format and it consists of several sections, such as code, data, ... OS will memory map this file into memory

- OS usually has virtual memory manager which will provide pages (perhaps 4k or so). While they seem like continuous memory, they are located either in physical RAM, in swap file, or a memory mapped file.

Each page has a set of attributes. Those determine if page is read-only, executable, whether it may be swapped or currently is and similar. There might be other tricks, such as shared pages between DLLs. It depends how OS does it, which memory architecture/model one runs on, ...

The on-disk executable format corresponds to how data is mapped into virtual memory (hence various alignment requirements, .reloc segment and similar).

What follows from above is that global and/or read-only data can make use of the above systems. Read-only would be marked as such, globals might not be read-only but might be shared (for DLLs). All of this is mostly optional and can serve as optimization. It's up to compiler to organize these things, but it doesn't strictly have to, especially with memory not being a primary bottleneck anymore.

Read overviews on exe formats, virtual memory and memory mapping relevant to your OS. Most of the stuff relies on tricky details.

Share this post


Link to post
Share on other sites
Im trying to make a little bytecode compiler and VM, and I just use the runtime stack. In the order theyre encountered in the script is the order they appear on the stack. There position on the stack is up or down from the currents scopes position. This also helps for recursive functions; you can just push another version of the stackframe on top of the stack, and the instructions manipulate them instead. After they reach "return", they pop their own stackfframe back off, and the script is back in the calling functions data again. And globals, as theyre encountered first, are typically referenced just from the bottom of the stack.

It does sound like such a honky tonk way of doing it, but as long as u pop off what you push on, you cant really muck up the stacks order. This means variables can live on the stack, and at compile time you can be sure of their position relative to some marker. The marker may be at index 5 or 50, but a particular local variable is always, say, +2 away from that 5 or 50.

Share this post


Link to post
Share on other sites
Perhaps I was not quite specific enough. I know how an OS handles this, the different file formats, relocation, stack frames, anything else. My executables are position independent, and are stored in an MSB format, which will be read in and swapped upon load, like all of my other file formats. Currently, global data is stored after the code block in the file, and is read in and placed on the main thread's stack before the initial stack frame. I'm beginning to think that read-only data should be stored with the global data, or something to that effect, since attempting to read from the read-only data block in memory is becoming tedious.

Share this post


Link to post
Share on other sites
You still did not specify the operating system, but since you are talking about “executables” I will assume you are interested in “exe” files on Windows.

Whatever global data you declare will be stored inside the .exe file.
If you initialize these values to 0, they will not actually contribute to the executable file size. When the chunks (or sections) of the .exe are loaded to memory, each chunk goes to a page-aligned boundary and there is a number telling how much to expand the chunk by padding the end with 0’s. This is where all of the globals set to 0 will live. Typically the chunk name is “.data”, but compilers are not required to use this name for chunks containing your globals, statics, strings, etc.

Unitialized data will typically go into a chunk called .bss.


Each chunk has a set of flags that determine readability, writeability, and executability. Since this flag is set for the whole chunk, read-only data cannot be in the same chunk as read-write data. Since the .bss section must be writable, it is common for read-write globals to go there and for read-only globals to go to the .data chunk.


So chunks are basically loaded as-they-are from the executable file into sequential pages in RAM. If you view a loaded .exe in RAM you will first see its PE header, then usually its actual code, then read-write globals (.bss), then embedded resources, then read-only globals (.data), and then the import table. Note that the sections can be in any order, but this order is common.
This is how all of the data is laid out in RAM. Globals will all be relatively close to each other, whether const or not, and also will be relatively close to the actual executable code.

Also, const or not, globals initialized to 0 do not contribute to the actual .exe size.


L. Spiro

Share this post


Link to post
Share on other sites

You still did not specify the operating system, but since you are talking about “executables” I will assume you are interested in “exe” files on Windows.

Whatever global data you declare will be stored inside the .exe file.
If you initialize these values to 0, they will not actually contribute to the executable file size. When the chunks (or sections) of the .exe are loaded to memory, each chunk goes to a page-aligned boundary and there is a number telling how much to expand the chunk by padding the end with 0’s. This is where all of the globals set to 0 will live. Typically the chunk name is “.data”, but compilers are not required to use this name for chunks containing your globals, statics, strings, etc.

Unitialized data will typically go into a chunk called .bss.


Each chunk has a set of flags that determine readability, writeability, and executability. Since this flag is set for the whole chunk, read-only data cannot be in the same chunk as read-write data. Since the .bss section must be writable, it is common for read-write globals to go there and for read-only globals to go to the .data chunk.


So chunks are basically loaded as-they-are from the executable file into sequential pages in RAM. If you view a loaded .exe in RAM you will first see its PE header, then usually its actual code, then read-write globals (.bss), then embedded resources, then read-only globals (.data), and then the import table. Note that the sections can be in any order, but this order is common.
This is how all of the data is laid out in RAM. Globals will all be relatively close to each other, whether const or not, and also will be relatively close to the actual executable code.

Also, const or not, globals initialized to 0 do not contribute to the actual .exe size.


L. Spiro


This is not a specific OS, this is a bytecode VM. I have my own executable format. This does not quite mention read-only data which would be stored in a .text chunk or something.

Share this post


Link to post
Share on other sites
I've scrapped the whole read-only block, due to lack of knowledge of a good way to implement it. Read-only data is simply treated as global data, just need to make sure nothing overwrites it.

Reasons for abandoning the read-only block are as follows:
-This would add much complexity. To maintain position independence, this would require not only a read-only block, but a block of offsets into the read-only block.
-The read-only data is stored in MSB order and must be converted and stored, just like the global data. I'd prefer to not overwrite the block in the executable or library's memory space with the swapped data. This would require storing it in another buffer, just like global data. Why not put it there with it?

If someone has a good reason to refute this, I'd take another attempt at implementing it.

Share this post


Link to post
Share on other sites
You could always load the read-only data into a special memory page/set of pages and then disable write permissions. The OS will then fault if your program tries to write to the read-only data blocks.

Share this post


Link to post
Share on other sites

You could always load the read-only data into a special memory page/set of pages and then disable write permissions. The OS will then fault if your program tries to write to the read-only data blocks.


That would achieve read-only status; however, this still has the aforementioned stipulations, and is now OS specific. I could also watch when the VM dereferences, and segfault if the address is invalid for any reason.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!