Where is the fundamental source code for C++

Started by
8 comments, last by kop0113 10 years, 6 months ago

By that I mean, C++ is a wrapper in the sense that the keywords you use are not that full source.

For ex.

'for' and 'while' loops are keywords, so where do we go after linking, interpreting and compiling, do the dll's contain the actual source for say 'for' and what would that be.

Another example, I remember reading once that the original 'Hello world' program had about 100 lines of code with the creators actually falling off their seats with laughter at how complex it once, hence that wrapped most of it up.

I am trying to understand the path from the keywords we use in C++ to a final machine code executable.

MessageBox is another example, what would be the full source code to actually draw a dialog box on the screen and where is this source, again I take it that's where the dll's comes in.

Advertisement

There is no sourcecode for "c++", there is sourcecode for various C++ compilers. (g++ and clang are opensource)

a c++ compiler converts the c++ code to assembly and/or machinecode for the target platform.

Things like MessageBox is not part of C++ it is a function or class provided by the OS (or a higher level wrapper library that in turn uses some equivalent function provided by the OS)

[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

There is no sourcecode for "c++", there is sourcecode for various C++ compilers. (g++ and clang are opensource)

a c++ compiler converts the c++ code to assembly and/or machinecode for the target platform.

Things like MessageBox is not part of C++ it is a function or class provided by the OS (or a higher level wrapper library that in turn uses some equivalent function provided by the OS)

Do we mere mortals have access to these MS classes?


By that I mean, C++ is a wrapper in the sense that the keywords you use are not that full source.

I'm not sure what definition of "source" you are using, and I suspect the one you have isn't particularly interesting or helpful in general. C++ is the "full" source code for anything that understands C++.

Even machine code isn't the "full" source, as modern CPUs decode that into their own internal microcode.


'for' and 'while' loops are keywords, so where do we go after linking, interpreting and compiling, do the dll's contain the actual source for say 'for' and what would that be.

There is no "source" for the for loop. The compiler understands what for is supposed to do, and it knows how to generate various instructions that achieve the same job, such as a comparison instructions at the end of the loop body instructions, with a branch instructions (like a goto) to the start of the loop body.

For example:

// Early stuff
for(int i = 0 ; i < N ; ++i) {
    // Loop stuff...
}
// Later stuff...

Could become the following psuedo instruction code:

    // instructions for "early stuff"
start:
    set register a to 0
    // instructions for "loop stuff"
    set register b to a minus N
    jump if register b less than 0 to label "end"
    increment register a
    jump to label start
end:
    // instructions for "later stuff"

Forgive any errors, I don't usually have to consider stuff at this level.

It depends on the instruction set involved, and even for a given instruction set there might be multiple ways of doing it. For instance, one optimisation technique is loop unrolling. If you have a simple for loop that does something three times, the compiler might decide to include three copies of the loop body instructions in the resulting executable.


Another example, I remember reading once that the original 'Hello world' program had about 100 lines of code with the creators actually falling off their seats with laughter at how complex it once, hence that wrapped most of it up.

If you compile a simple C++ program, such as "hello world", the majority of the code will be doing things such as initialising the runtime or handling any exceptions, and so on. Only a small amount of the code will be directly related to what you wrote.

As your programs become bigger, this setup and teardown code becomes an increasingle minuscule fraction of the final executable.


I am trying to understand the path from the keywords we use in C++ to a final machine code executable.

Modern compilers are generally too complex to try and trace like this, without becoming an expert in the subject. However, you can certainly think through the kind of transformations a naive compiler would make to transform simple code to machine code.


MessageBox is another example, what would be the full source code to actually draw a dialog box on the screen and where is this source, again I take it that's where the dll's comes in.

MessageBox is different. The source code is probably in C or C++, and is owned by Microsoft. I believe that Microsoft does give the Windows source code to educational institutions. The compiled code is in a DLL on your system.

Generally if you ask your compiler nicely it can give you the assembly code for the C++ code you give it. For example on MSVC you can use the /FA family of switches and on gcc you can use -S. Asking MSVC very nicely to compile

void bar(void) {
  for (int i = 0; i < 10; i++) {
    foo();
  }
}
gives this assembly listing in release:

?bar@@YAXXZ PROC					; bar, COMDAT

; 5    : void bar(void) {

	push	esi

; 6    :   for (int i = 0; i < 10; i++) {

	mov	esi, 10					; 0000000aH
$LL3@bar:

; 7    :     foo();

	call	?foo@@YAXXZ				; foo
	dec	esi
	jne	SHORT $LL3@bar
	pop	esi

; 8    :   }
; 9    : }

	ret	0
(where very nicely includes using the /FAs switch and liberal sprinkling of __declspec(noinline) to force the compiler to not inline the appropriate functions out of existence).

There is no sourcecode for "c++", there is sourcecode for various C++ compilers. (g++ and clang are opensource)

a c++ compiler converts the c++ code to assembly and/or machinecode for the target platform.

Things like MessageBox is not part of C++ it is a function or class provided by the OS (or a higher level wrapper library that in turn uses some equivalent function provided by the OS)

Do we mere mortals have access to these MS classes?

Generally no and you wouldn't need to either, MSDN has the details you want for these API calls. Reading MSDN docs has a bit of a learning curve though and you need to read it all, especially the REMARKS section as that often contains the information about error codes and success values.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

Wow you guys are good, never ceases to impress me when I read comments from people who know their ins and outs of programming, especially regarding C/C++ and assembly. How long did it take you to get to these levels. Do you guys work in the industry, I assume you must.

By that I mean, C++ is a wrapper in the sense that the keywords you use are not that full source.

For ex.

'for' and 'while' loops are keywords, so where do we go after linking, interpreting and compiling, do the dll's contain the actual source for say 'for' and what would that be.

I'm guessing you read or someone told you that everything between parenthesis() is a function. So for() and while() have to be functions. And you also know about compiling and linking stages, so the definitions for 'for' and 'while' have to be somewhere, in an obj file.

Well, these are not functions. They're keywords. As SimonForsman and rip-off said, the compiler directly converts these keywords into assembly language based on the expressions to be evaluated.
There is no function syntax where ";" can be inside parenthesis, yet the syntax is for( int i=0; i<count; ++i ) and that's because for is not a function.
You could see the compiler's source code though, to understand how they are translated into assembly. But only Clang & GCC are open source, so MSVC's source code is out of the question unless you're a Microsoft employee with access to that code.

MessageBox is another example, what would be the full source code to actually draw a dialog box on the screen and where is this source, again I take it that's where the dll's comes in.

MessageBox is a function. It is forward declared somewhere in Windows' headers (Winuser.h IIRC, you can look that up). It is also defined as an external function to a dll called user32.dll; when you link user32.lib

You have to link against user32.lib so the linker knows MessageBox is defined externally (in a dll called user32.dll); otherwise you will get build errors that there's a missing external reference to MessageBoxA/MessageBoxW (A for ANSI, W for Unicode)

The linker sees user32.lib, and creates the executable with a reference to user32.dll's MessageBoxA/W. When the exe is executed, the OS will see that reference and load the DLL and the function in question so that it can be used by your application.

You can't see the source code from MessageBox (though you can see the assembly) because you would need to be an MS employee with access to Windows' src.

You can however, look the src from Wine's implementation (a sort-of-emulator to run Windows programs in Linux) which will be probably very similar (or not)

I hope this helps you understand the compiling & linkage process

Cheers

Wow you guys are good, never ceases to impress me when I read comments from people who know their ins and outs of programming, especially regarding C/C++ and assembly. How long did it take you to get to these levels. Do you guys work in the industry, I assume you must.

The subject of compilers and OS programming is something hundreds of people spend their lives on. So, while you could certainly learn alot from trying to do any of that yourself (and why not,) it's not a beginner topic. The fundamentals behind all of this is something everyone who wants to become a better programmer really does need to know.

The school I'm on didn't teach me anything about compilers, but alot about OSes, and alot of other things (in fact, almost everything except compilers). I already knew most of it though, and the advanced topics on compilers is sometimes necessary to know.. IDEALLY no one has to know anything about the underlying system, the compiler, the toolset around the compiler or the build chain. You could write the

code you felt like writing, the compiler would take care of everything, the IDE would create perfect build packages for every OS, and the sentient compiler including making the best possible optimizations. Unfortunately in reality there is alot more to it.. especially algorithms, utilizing caches well, concurrency and memory management

Good question though

Also, the sentient compiler obviously tells you where you went wrong, fixes your bugs and finishes some of your modules at night.

If you are interested in seeing how much of the userland (i.e MessageBox) stuff in Windows could be implemented, have a browse through the Wine source code.

http://source.winehq.org/git/wine.git/tree/HEAD:/dlls

As I recall, MessageBox is implemented in user32.dll so you might want to start there.

Also, for some completely useless information, the implementation of foreach in C++ (pre C++11) is not implemented as a keyword (unlike if, while, etc..) but is implemented in a variety of ways (macro, template, etc...) by projects like Boost and Qt. This shows that the language is pretty darn powerful.
http://tinyurl.com/shewonyay - Thanks so much for those who voted on my GF's Competition Cosplay Entry for Cosplayzine. She won! I owe you all beers :)

Mutiny - Open-source C++ Unity re-implementation.
Defile of Eden 2 - FreeBSD and OpenBSD binaries of our latest game.

This topic is closed to new replies.

Advertisement