What really happens to variables during compilation ?

Started by
4 comments, last by Markie 22 years, 3 months ago
Hi I have programed in BASIC and a little C before and have also a tiny understanding of ASSEMBLY, but one thing actually still elludes me: What really happens EXACTLY and in debth when I compile and execute a C-program with a variable such as: int a=5; ? I mean I know this 5 ist stored in the "a", but where does the "a" go? I know that C-compilers basically translate C-programs into assembly, which is assembled by an internal assembler, which creates machinecode... 1''s and 0''s... But where does the "a", my variable go? I know it''s stored in some memory (address), but in which one and who or what decides in which address and how is it found again? Does the C-compiler or the integrated assembler just decide first variables go here, second ones there, etc...? Or is it the operating System which finally decides what happens physically to my variable "a"? Are variables kept as variables even in machine-code or are they turned into fixed numbers (memory-addresses) (which seems more likely)? I''d like to know in depth, what exactly happens to a C-line such as "int a=5" during compilation and execution (under Windows)... I really couldn''t find that piece of information in the web... :-( I hope it''s not a beginner''s question and someone really knows the answer! Greetings, Mark
Advertisement
It''s a little hazy, and my information is probably out-of-date, but I''m fairly certain this is how 16bit programs work(ed):

The ''a'' disappears. The compiled program has _no idea_ what you called anything, it''s store on disk as a relative offset from the base load point of the program. The loader loads the program into memory (go figure), and then it races through the machine/assembly code and translates all the relative offsets into hard offsets now that it knows the address the program is actually loaded into. Then it turns executution over to the program. For .com''s it jumps to 0x100 and cuts-loose, I don''t think I ever learned how .exe''s work, so all of that may only apply to .com files.

Magmai Kai Holmlor

"Oh, like you''ve never written buggy code" - Lee

"What I see is a system that _could do anything - but currently does nothing !" - Anonymous CEO
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
in source code, variables do not exist - names exist. during compilation, the specific compiler will use a memory scheme which it will use to organize the variables in the program.
different sorts of compilers make different sorts of programs, which use different schemes.
when you have int a = 5;
the compiler will first figure out where to put the variable relative to an entry point meaninful to the processor or system the program''s designed to run on.

''a'' is called an identifier. identifiers are used by compilers so the compilers can translate programs into other forms, like machine code. when the program is compiled, ''a'' is replaced by a value, like an offset.

programs can be managed in different ways. Some processors have a register that can point to memory, which can be used to access variables through offsets (offset of the base location (base + offset = location)).

to really understand a variable, you need to understand how a program works in relation to how it is processed. a program can be generalized as an array of characters, where each character represents the value of a particular memory location.

The first character is the entry point, and is an instruction. After that, its impossible to know what any subsequent characters are, without a standard instruction-length (im not aware of any sort of standards). its possible to have static variables scattered amoung instructions, throughout a program, and its possible to have them all chunked in a specific area.

all variables exist dynamically, meaning the program is loaded into a random point in memory, so that all static variables and instructions exist only relative to the entry, and nothing else. temporary varaibles are dynamic in that they are created at a random point on the stack, and are only relative to a base pointer. when they get destroyed, new temporary varaibles can occupy their location.

Some variables, used by the OS exist in STANDARD locations.

does this make since? to sum, a variable is a NAME, used by COMPILERS to reference a memory location (or series of). the computer refrences these locations using numbers.
---
im not an expert at assembly, but logically, everything ive said is accurate, if not vague.

in mvc++, look at the disassembly, youll learn alot.

try opening a .exe is text mode... youll be able to see all instructions, and depending on the memory scheme, you may see all static variables too. again, depending on the scheme, you may or may not see temporary variable space; usually, you wont see them, since the stack is used.

you should whip out a peice of paper and pencil, and try to write your own assembly, and invent a memroy scheme - youll have a ball.
the program can request memory from the operating system, so OSs, like win2k can manage memroy. ( i have not studied win2k though).
Okay, lemme try =)

To really understand this, you should try learning assembly (I say this ''cause you seem to have an interest in how things work and assembly will give you an in-depth view).

Where your variable ''a'' resides depends on how and where you declare it. The simplest is the global variable. All global variable are listed in the programs data area. I believe windows allocates them on-the-fly. In the old days of DOS we didn''t have this luxury. They actually took up space in the .exe. I''m not sure about static vars but I believe they are actually placed in the code wherever you actually placed it. I''ve never disassembled a windows .exe so I can''t be TOO sure of how these var types work.

Usually your variables are declared within a function or method. To understand local variables, you need to understand on a low level what''s happening when you call a function. In C++ you can simply say:

int Class::YadaYada (long x, long y);

Here we are passing two 32-bit integers to a method and returning a 16-bit integer. The low level view of this isn''t so simple. In memory, you''re program has a stack structure that it uses as temporary storage (a lil'' less temporary than the registers in the CPU). This is a LIFO structure (Last In-First Out) meaning that you can push something on top of the "stack" and pull something off the top. When this method is called, you push our two 32-bit integer parameters onto the stack. We call the method (meaning the CPU jumps to its memory address) and the newly called method reads the top of the stack to see what parameters were passed. The returned integer value as well as all variables declared within the method are given space on the stack (more than likely in the order that they were declared). When the function terminates, these values are popped off the stack and are lost forever (therefore losing "scope").


- Jay


Many of the truths we cling to depend greatly on our own point of view

Get Tranced!
Quit screwin' around! - Brock Samson
Thanx a lot guys!

I wasn''t sure about it before, but now I know for sure: This is probably not such a trivial
question and was posted to the right forum... :-)
I will look into assembly as it really does interest me, what ta heck is really going on, but
it''s way to dry and de-motivationg to start off with by itself. I still need to master C++ and
windows-programming (tired of those mute and text-based C-beginners examples in DOS-screens)
and have yet to write my first own windows-game (or any windows.exe actually).

Mark

- Once you start getting into modern game-programming and how computers work, the more you''ll
find you seem to know less and less every day, as your knowledge and horizon starts to expand...

This topic is closed to new replies.

Advertisement