• Advertisement

Archived

This topic is now archived and is closed to further replies.

Understanding what happens under the hood

This topic is 5099 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am trying to understand completely what actually occurs from loading a C++ aplication into Visual C++. To compiling the program to running the program. I know this is huge and very very ranged. I have a lot of it down just not sure about some of what happens. For example, when I load in the program, are all the program elements assigned memory locations at this point? Once I compile the program are all elements assigned memory addresses then? Or are all elements assigned memory locations once I click RUN from the menu? If anyone knows any really decent resources on this I would appreciate the URL. Or if you want to have a go at explaining it, I would appreciate that also.

Share this post


Link to post
Share on other sites
Advertisement
I don''t know what you mean by "elements", but here''s a simplified description.

When you load your program''s source into VC++, your program does not yet exist in a computer-understandable form. It''s just a bunch of text, nothing more.

When you press the compile button, the compiler goes over all the text and then converts it into assembly code, which is still just text, but in a very primitive form that can be easily translated into machine code. Finally, an assembler converts the assembly code into a bunch of numbers that the CPU can interpret, and those are stored in an EXE file.

When you press Run, the EXE file is loaded into memory, and your CPU starts executing the instructions it contains.

Share this post


Link to post
Share on other sites
What I am trying to understand is, when a program is loaded into something like MS Vis Studio 6 for example. The program is stored in memory. Now does each element of the program (by element I mean variable, function etc) get assigned a memory address at this stage, or is the program just one big ball in a memory address?

Does each element get split into individual addresses at compile time then or is it at run-time?

In a nutshell I am trying to understand what happens in regard to memory locations at the point when the program is loaded into MS Vis Studio 6, the point following the program being compiled and the point once Run has been selected.

Help appreciated.

Share this post


Link to post
Share on other sites
Unfortunately theres so much stuff that happens that it takes you two semester long courses at a uni to understand it (actually probably two more if you want to really really understand it). If you really want to know, buy a book on compilers such as the Dragon book.....

Share this post


Link to post
Share on other sites
What you load in VS is not a program, it''s only the source code, the concept memory location doesn''t apply here. In the output of the compiler there will still be symbols instead of memory addresses, a global variable a will still show up as "a" in the .obj file the compiler generates. The compiler will convert local variables to stack locations or registers, these are not absolute memory locations. The .objs generated by the compiler will be combined by the linker, the linker will decide in what location to place the variables. These locations will show up like that in the machine code.
When the program is loaded, the machine code is copied varbatim to the "code segment", the addresses of global variables will refer to space in the separate "data segment", the addresses are relative to the start of the segment. The Operating System decides where the segments get placed in the physical or virtual memory.

Share this post


Link to post
Share on other sites
Okay, I''m getting the loading part. So when I choose Load and the program name in MS Vis Studio, none of the variables or functions are assigned memory locations. This is because all that is loaded is in effect as ascii file.

When I choose Compile, the ascii text is converted to machine code or assembly language which the processor understands.

Then when I choose Run, the processor executes the machine code instructions.

Objects created on the stack or in static storage are allocated memory locations at compile-time.

Objects created on the heap are allocated memory storage at run-time.

Am I on the right track here? Thanks for the help so far

Share this post


Link to post
Share on other sites
quote:
Original post by Sir_Spritely
Objects created on the stack or in static storage are allocated memory locations at compile-time.


Not exactly. I suppose you could sort of say that things in static storage are allocated at compile time, in a way. But objects on the stack are, naturally, created on the stack, at runtime.

Share this post


Link to post
Share on other sites
I think I am getting confused with the actual meaning of compile-time and run-time.

I have compile-time as when I press compile from the menu. Which is when the ascii is converted into machine code. So when I create elements on the stack I must know the quantity, lifetime and type. Otherwise what? What happens and can someone put in an example code and explanation?

By Run-Time I presume once I have pressed Run on the menu and the first line of machine code has been executed by the processor, is this correct? In which case elements created on the heap at/during run-time are created dynamically. So the compiler knows nothing about these elements? No memory is allocated by the compiler at compile-time because they don't exists yet?

So elements on the stack in which the compiler knows about them such as the quantity, lifetime and size etc, memory locations are assigned at compile time? Or are memory locations for objects created on the stack still assigned at run-time? In which case dynamic objects created on the heap can only be assigned memory locations at run-time. So there is literally no difference then between dynamic object creation and normal object creation because both are created and assigned memory locations at run-time right?

Any more help on this is really appreciated.

[edited by - Sir_Spritely on March 7, 2004 10:00:39 AM]

Share this post


Link to post
Share on other sites
Compile time is when the compiler transforms source code files into object files. Link time is when object files are linked together into an executable file. Most of the time these happen one after the other and thus both fall under compile time.

Run time is when the program runs, regardless of whether it's launched from within an IDE (eg. Visual C++), a shortcut, windows explorer, the start run menu, a command prompt or what have you. Run time is when the program is running, doing whatever it's programmed to do.

A compiler doesn't allocate memory - at least it's not helpful to think of it that way, imo. A compiler supplies instructions, some of those instructions allocate memory when executed.

quote:

So elements on the stack in which the compiler knows about them such as the quantity, lifetime and size etc, memory locations are assigned at compile time? Or are memory locations for objects created on the stack still assigned at run-time? In which case dynamic objects created on the heap can only be assigned memory locations at run-time. So there is literally no difference then between dynamic object creation and normal object creation because both are created and assigned memory locations at run-time right?



The operating system sets up the stack as a reserved section of virtual memory in the address space of the process. Every thread has it's own stack. The default stack size on windows is 1 Mb. Before the thread begins execution, two pages of the reserved stack memory are committed. The compiler deals with instructions for manipulating the stack but not with assigning memory at run time. An object created on the heap belongs to the process. An object created on the stack, belongs to the thread that owns that stack. As a thread executes it pushes values on to the stack and pops them off when they are no longer needed in order to keep the stack properly balanced. This is important because it allows functions to return where you coded them too. The arguments to a function are pushed on to the stack, then the function is called. The call instruction also pushes the return address of the function on the stack. Then space is made on the stack to accomodate local variables. The word "made" suggests allocation, but that's not the case, it's just a limitation of speech. What happens is that the pointer to the top of the stack is adjusted to allow for using these stack entries. A stack is really an array of 262144 DWORDS with a pointer that indexes where the latest value is stored. For ints and other basic data types placed on the stack, no explicit clean up code is needed. The compiler supplies the instructions that move the stack pointer back to where it was before the function was called, thereby cleaning up those local variables. The next function call will overwrite the location where they used to be, like they never were there. If a local variable is used to hold a pointer to dynamically allocated memory - an object, a chunk or what have you - that memory has to be freed before exiting the function because otherwise that pointer gets lost - it gets overwritten with something else - and thus the memory that it points to can no longer be freed. In C++ the compiler often supplies the code to call an object's deconstructor before the function returns, but there are a variety of particulars that you should study for yourself.



[edited by - lessbread on March 9, 2004 3:58:17 AM]

Share this post


Link to post
Share on other sites
Like someone said you are opening up a can o worms.
Anyways on windows this is all the work the OS does just to get your compiled exe running(from inside windows book):
A Win32 process is created when an application calls one of the process creation functions, such as CreateProcess, CreateProcessAsUser, or CreateProcessWithLogonW. Creating a Win32 process consists of several stages carried out in three parts of the operating system: the Win32 client-side library Kernel32.dll, the Windows 2000 executive, and the Win32 subsystem process (Csrss). Because of the multiple environment subsystem architecture of Windows 2000, creating a Windows 2000 executive process object (which other subsystems can use) is separated from the work involved in creating a Win32 process. So, although the following description of the flow of the Win32 CreateProcess function is complicated, keep in mind that part of the work is specific to the semantics added by the Win32 subsystem as opposed to the core work needed to create a Windows 2000 executive process object.

The following list summarizes the main stages of creating a process with the Win32 CreateProcess function. The operations performed in each stage are described in detail in the subsequent sections.


1.Open the image file (.exe) to be executed inside the process.


2.Create the Windows 2000 executive process object.


3.Create the initial thread (stack, context, and Windows 2000 executive thread object).


4.Notify the Win32 subsystem of the new process so that it can set up for the new process and thread.


5.Start execution of the initial thread (unless the CREATE_SUSPENDED flag was specified).


6.In the context of the new process and thread, complete the initialization of the address space (such as load required DLLs) and begin execution of the program

and I got this from memory mangagment book
Memory Allocation
When the kernel allocates physical memory for a process, it sets up the allocated memory so that the first address (i.e., the lowest address) is a multiple of 64KB. In other words, processes are aligned in physical memory on a 64KB boundary. The size of the address space reserved for a process is a multiple of the native processor''s page size. On a Pentium, an application would be given a plot of real estate in physical memory that is a multiple of 4KB. The Pentium does provide facilities for larger page sizes (i.e., 4MB), but everyone in their right mind sticks to 4KB page sizes (MMURTL, Linux, Windows, etc.).

One of the fringe benefits of being a user process is that each task is constructed with its own heap. Figure 2.22 displays one of the possible memory layouts for a user process. The stack grows down from the highest address, and the heap grows up toward the stack.
The exact organization of a program''s code, data, stack, and heap sections are a function of the development tools used to build the program. Linkers, in particular, can decide where to place an application''s components. The linker will normally process object files in the order in which they appear on its command line. For each object file, the linker will embed program sections into the executable as it encounters them. The /DO linker option can be used to alter this behavior so the program''s sections are arranged in the Microsoft Default Order.




If God played dice, He''d win.
—Ian Stewart, Does God Play Dice? The Mathematics of Chaos

Share this post


Link to post
Share on other sites
I think your getting a little confused between compile and runtime and the stack and the heap.

The stack consists of data that is not explicitly dynamically allocated using new or malloc. However, this does not mean that it can be created at compile-time. For example consider the function below.


int factorial(int operand)
{
if (operand == 1)
{
return 1;
}
return operand * factorial(operand-1);
}



Say you ask a user to input an integer num and then call factorial(num). There is no way for the compiler to know how many times factorial(int) will be run...

and each call to num results in the addition of an element (the activation record - which includes the local variables initialized, the return addy, etc.) in the call stack. So the stack too is created at runtime.

The difference essentially is that when you initialize a variable in a function - at the end of the function the variable loses its scope and is automatically detroyed.

However, when you allocate memory on the heap explicitly, the new memory is allocated off the call stack, in a separate structure called the heap. When you leave the function scope, the stack is cleaned, but the memory allocated is not. So....

   
struct useless_struct{
int *value;
}

void assign_to_struct(int i, useless_struct &inst)
{
// inst.value = i; // i will be cleaned off the stack,

//leaving a dangling ptr

inst.value = new int(i); // this works OK

}


So essentially both the stack and heap are runtime components of variable size. The only fixed size is that of the code(functions definitions, struct and class sizes, and globals)....

Hope that helps...
BTW i know this is a lot to digest at once, which is why i highly recommend a book, because im sure a PhD can explain this much better than I

EDIT: Stupid me... i forgot the ampersand in the func def..

[edited by - psamty10 on March 9, 2004 2:21:52 PM]

[edited by - psamty10 on March 9, 2004 2:22:29 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by psamty10
    
struct useless_struct{
int *value;
}

void assign_to_struct(int i, useless_struct &inst)
{
// inst.value = i; // i will be cleaned off the stack,

//leaving a dangling ptr

inst.value = new int(i); // this works OK

}


EDIT: Stupid me... i forgot the ampersand in the func def..



Thanks for the explanation. I was wondering about this myself too.

Oh, and you forgot an ampersand for inst.value = i too.





--{You fight like a dairy farmer!}

Share this post


Link to post
Share on other sites

  • Advertisement