I was having difficulties figuring out how to properly compile and link a C++ program I was working on, so I did some research on how the compiling actually works. I felt I was so informed, I should post to my blog about it, and then I was so proud of my blog post I thought I should spam this forum with a link to it. So I present to you an introductory look at just how C and C++ programs are compiled, from the basics of preprocessing to the difference between static and dynamic libraries. I hope you find it informative and maybe even useful for your own programming : http://dunoid.org/index.php/2016/03/02/the-c-compilation-process/
How C++ Programs Are Compiled (A Brief Look)
2. Your post is highly incomplete, inaccurate, misleading at times, and generally inadequate for covering the compilation process.
You don't talk at all about how compilation actually works. Instead you gloss over it and (incorrectly) state that compilers emit assembly language. While assembly can be an artifact of compilation, most compilers actually emit binary object files in some well-defined format.
Assemblers are also fairly more complex than you give them credit for, by the way. Most assemblers today do some hefty transforms on the code from textual to binary flavors. It is also hardly true that "nobody" writes raw assembly - largely because the conveniences offered by a 21st century assembler are far beyond the limited remapping of mnemonics to bitcodes.
Linking is again just glossed over in your post, which is a shame, because the complexities of symbol resolution, COMDAT folding, LTCG, etc. etc. are all well worth proper exploration.
You also seem to have favored a giant dump of your executable instead of explaining how the executable image format (on your platform of choice) actually works, etc.
I commend your quest for personal enlightenment, but please take care when advertising your journey as an authority.
Please don't spam blog links into the General Programming forum, unless you are actively looking for a discussion on them. In this case please start a discussion instead of just dropping links and then the microphone.
...
I sincerely apologize! I guess I was just too excited to really consider the point of this forum before posting. Blogging is new to me, so I don't really know how to advertise myself appropriately. I'll take care to only link when it provides a meaningful benefit in the future.
Your post is highly incomplete, inaccurate, misleading at times, and generally inadequate for covering the compilation process.
....
Well, this is embarrassing for me. I always have the problem of making assumptions too quickly without really investigating. For example, I mistakenly assumed assemblers had a simple job, since I had made a RISC assembler once. But of course, what I programmed has no reflection on what a professional organization would make. And since my blog also serves as a sort of resume for me, it's doubly important I have accurate information. I'll keep researching and update my article.
Again, sorry and thanks.
As far as advertising yourself, I recommend sticking the link in your profile quotes section (I know there is a name for it, but I am writing this from my phone and can't get to it).
You can always start a developer journal on this site, even if all you want to do there is post excerpts that link to your real blog. That's perfectly fine ("spamming" the forums isn't though).
That being addressed now, we don't need to keep piling on OP's rep now, do we?
A brief suggestion. I would not use the basic information of compilers as a portion of a resume. The topic is actually more complex then what your blog is covering. Here's a veeeeeery basic explanation. A lot of details are skipped.
In reality, your compilers are usually divided up into several stages. Like the GCC compiler, which is really a series of scripts and pipes.
First stage is the Lexer, which tokenizes your code. This is usually a context free grammar that makes the next stages easier to manage. By "Context Free", the easiest way to understand it is that the human meaning is stripped out, and all we care about is the structure. These tokens are characteristic strings with meaning and rules. You can look this up at some point. Most data foundation based books will cover this topic. here's a link https://en.wikipedia.org/wiki/Lexical_analysis
The lexer is usually combined with a parser. The syntactic analyzer. This is an error checking stage. This makes sure that your code follows the grammatical rules by analyzing the string of tokens.
Somewhere along those lines, you have an optimizer. If you look at the structure of your tokens, it kinda looks like an abstract syntax tree. The optimizer will look for defined patterns, and will make attempts at improving the code where it can.
Example:
int a = 0;
For ( int i = 0; i < 30; ++i)
{
++a;
}
The optimizer will see this for loop. Notice that there's absolutely no need for this for loop as it's just iterating and adding. This is an absolutely redundant piece of code. When the optimizer makes it's edit the code will look like this.
int a = 29
Once the code passes the syntax check, and gets optimized, it's passed to the assembly stage. The assembly stage reads the tokens and produces the assembly code. This assembly code is implementation specific. Different compilers will do different things. And will make different assembly code for their targeted devices. This may seem redundant, but it's an important stage for how things get linked, and other things. I'm not talking about it because it gets complex. REAL COMPLEX. I barely understand it.
The assembly code is then converted into either hex or binary with a set of rules. It depends on what the system reads. Most commonly Binary. But hex is used in some scripting languages, and virtual machines. That... is also pretty specific to the targeted system.
Example:
int a = 0; For ( int i = 0; i < 30; ++i) { ++a; }
The optimizer will see this for loop. Notice that there's absolutely no need for this for loop as it's just iterating and adding. This is an absolutely redundant piece of code. When the optimizer makes it's edit the code will look like this.
int a = 29
[OCD]
Just needed to correct this -- a would be equal to 30 in the given example, not 29.
[/OCD]
Probably not?
When i = 29, a = 29. At the end of that loop, i is incremented to 30, and breaks the for loop. The for loop will only tick when i < 30, it may not be equal to thirty.
Probably not?
When i = 29, a = 29. At the end of that loop, i is incremented to 30, and breaks the for loop. The for loop will only tick when i < 30, it may not be equal to thirty.
for (int i=0; i < K; ++i)
{
// This code is executed K times
// (OCD disclaimer 1: normal cases where K >= 0 && K < int.MaxValue)
// (OCD disclaimer 2: and where the optimizer didn't optimize out the loop.)
// (OCD disclaimer 3: and where 'i' and 'K' are not modified in any other way than what is seen here.)
// (OCD disclaimer 4: and where the processor, RAM, etc. do not have hardware defects.)
// (OCD disclaimer 5: and where the thread/process/computer is not shut down abruptly.)
}
(OCD edit: forgot the "int")(OCD edit 2: moved the first disclaimer to line up nicer with the others)