Sign in to follow this  

compiler output & executable basics

This topic is 4748 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

(1)when a compiler finishes compiling, an executable is generated. does this file contain hex or binary? and why, when i open the executable in Notepad, do I find ASCII symbols and text characters instead of numbers? how do 'hex editors' view data differently than Notepad - isn't it all just raw data in the end? (2)when a user runs a program, the OS copies the file from hard drive into RAM. what process, then, actually pushes the program (line by line) through the CPU over the bus system? (3)what are the possible damages if I made a file in Notepad, wrote a bunch of scribble in it, saved it with a .exe extension, and then tried to run it? (4)and can anybody explain to me the exact processes going on under the hood of .NET during compilation (breakdown of build, link and compile stages)? thanks

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
(1)when a compiler finishes compiling, an executable is generated. does this file contain hex or binary? and why, when i open the executable in Notepad, do I find ASCII symbols and text characters instead of numbers? how do 'hex editors' view data differently than Notepad - isn't it all just raw data in the end?

-Hex and Binary are two systems that describe the same thing, each is a numbering system. For example, the number 4 in hex is 4h and in Binary it is 0100b. So to answer your question, both, although its stored in 1's and 0's. When you open a binary in notepad, notepad tries to interpret the information as ASCII. For example, lets say that part of the binary executable had a 41h in it, to notepad this would be an A, however 41h might actually be a command for the computer (Im not actually sure however, my microapps text book does have a chapter on converting ASM commands to hex).

(2)when a user runs a program, the OS copies the file from hard drive into RAM. what process, then, actually pushes the program (line by line) through the CPU over the bus system?

-I really have no actual answer, but im sure the OS is what does this.

(3)what are the possible damages if I made a file in Notepad, wrote a bunch of scribble in it, saved it with a .exe extension, and then tried to run it?

-Ive done this before, damage? most likely none.

(4)and can anybody explain to me the exact processes going on under the hood of .NET during compilation (breakdown of build, link and compile stages)?

-dont remember sorry

Share this post


Link to post
Share on other sites
1) The file contains binary data. An exe file is, putting it simple, a long list of instructions for the CPU to execute.
You can read the .exe data as you like: as hexadecimal data, ASCII data etc, but it remains binary data. Notepad is a text editor, so it interprets the binary data as ASCII. For example, if it finds the hex value 0x6A, it reads at the ASCII 'j' character. Reading the file like this has very little sense, so what you get is rubbish. An hex editor is just very flexible editor that allows you to see the data as you want. By default, it shows the hexadecimal value of each byte, this makes much more sense than reading it like as ASCII.

3) None. An .exe file, in addition to the opcodes, has a bunch of other stuff that is required in order to run, such as headers etc. It won't even run. Try messing up with .com files instead, they just instructions, and you could actually get it to run and destroy your pc :)
(edit: I'm joking here, it's really hard to do something harmful this way :P)

Share this post


Link to post
Share on other sites
3. The OS creates a process and loads your EXE into RAM. Then it tells the processor to start running at first instruction of your program (your main function). Once there, the processor executes each instruction in the order it appears in RAM. Some instructions tell the processor to move to another location in memory (this happens when you have loops, if statements, or call another function). This is a very cursory explanation. One could go on all day about how the processor, operating system, and machine code work.

4. I don't know ALL the details, but the compiler first runs a lexical analysis of your code. This allows the compiler to check for syntax errors and turn text code into something more useful. Then, that code is translated into MSIL (Microsoft Intermediate Language) and then stuffed into an assembly (which is a byte-code representation of MSIL). When you run the program for the first time, the byte-code is compiled into native machine code and executed just like any other program.

Share this post


Link to post
Share on other sites
Quote:
Original post by helmslar
(1)when a compiler finishes compiling, an executable is generated. does this file contain hex or binary? and why, when i open the executable in Notepad, do I find ASCII symbols and text characters instead of numbers? how do 'hex editors' view data differently than Notepad - isn't it all just raw data in the end?


The file contains raw data, which can be considered as a sequence of bits, or as a sequence of groups of (fill in your favourite number here) bits (except that there will always be a multiple of 8 bits on any architecture you're ever likely to see, because 8 bits make a byte), or in a variety of other ways.

Hex editors and text editors (like Notepad) - and all other programs - impose an interpretation upon "raw data" (although with most programs, most raw data will be rejected). Notepad's interpretation, basically, is to treat every byte as an ASCII character, and display the resulting sequence of ASCII. A hex editor will treat every byte as two sets of four bits, and let each of those be a hex digit. Then for each set (called a "nybble" - really!) it will output one of the symbols in "0123456789ABCDEF" according to the values of the bits. Other programs might interpret it as raw bitmap data. You might not even *see* anything; Winamp imposes an interpretation on data as well (it looks at the first few bytes to figure out the format, then does appropriate translation into sound values which are fed to the sound hardware - ultimately the data is translated into a sequence of numbers that specify where to move a vibrating element in the speaker at each moment in time, and that creates a sound wave.)

So you will *always* see "ASCII symbols and text characters" when you open a file in Notepad. You could write a program, if you like, which opens a file, reads it byte by byte, and outputs the human-readable value of each byte (or pair or quad of bytes). And you could ask it to "read" an executable; you might even be able to tell it to read itself, depending on whether your OS will allow it. An executable is just another file.

Quote:

(2)when a user runs a program, the OS copies the file from hard drive into RAM. what process, then, actually pushes the program (line by line) through the CPU over the bus system?


The CPU is constantly "running a program" - i.e. the OS. From the time that the computer boots, the processor is maintaining an "instruction pointer" which points to the place in memory where the current instruction is. Normally, this will advance a few bytes in memory after that instruction is executed. Some instructions will change the pointer value to something else. Yes, that means that all your lovely ifs and whiles and fors are implemented with the dread GOTO under the hood. The reason for using the cleaner, "structured programming" constructs is the level of abstraction you get.

Anyway. This process runs automatically - current is always flowing, and on every clock cycle, the processor will move to the next stage of its operation. (At some point, clock cycles might have actually mapped 1:1 with instruction executions. Nowadays things are a lot more complicated, but it's a decent model for understanding things...)

So all the OS really needs to do is (a) run the instructions to copy the file to some specific block of RAM; and then (b) set the instruction pointer to point at the beginning of that block. Then the processor starts working on that instead. (Again, this is very oversimplified, considering things like threading and virtual memory, but it's the basic concept).

[google] "computer organization" for more information. You should be able to find some university course notes for an introductory class.

Quote:

(3)what are the possible damages if I made a file in Notepad, wrote a bunch of scribble in it, saved it with a .exe extension, and then tried to run it?


It *could* do anything the OS will allow, which means the damages are dependant on the security flaws of the OS (and the privileges with which you run the program). On a hypothetical multiple-user OS which is completely secure, you would only ever be able to trash the current user's stuff.

However, it is *extremely* unlikely that it will do anything at all, because as noted, .exes require some header information and you probably won't end up with a valid header for your "executable data". (the .exe file contains machine code for the processor, but also a few bytes at the beginning that the OS uses for various management functions.) Thus your OS will refuse to load it.

In university labs you may be able to get at some custom hardware board where there is basically no OS in place (of course there is no meaningful permanent data on these systems either ;) They are meant for experimentation). You can mess things up pretty badly (such that you start getting weird garbage out the I/O and have to reset the thing), but you won't be able to do any physical damage (or change anything in the ROM), so it's still safe to play around with. The worst that happens is that one of your programs destroys itself (and/or others) and you have to reset the board and re-load everything.

Quote:

(4)and can anybody explain to me the exact processes going on under the hood of .NET during compilation (breakdown of build, link and compile stages)?


(Edit: this is what a compilation process generally looks like when you are producing an actual machine-code executable. With .NET, the output is an "intermediate code" (the MSIL), which is itself kind of like a machine code for a processor that doesn't actually exist. That can either be translated into "real" machine code later, or interpreted by a "virtual machine".

As I understand the term, a "build" is the whole process, consisting of preprocessing, linking, compiling and anything else that needs to be done. A build can be scripted with a "makefile", which basically describes what files are needed to create some other given file, and how to do it (e.g. an executable might be created by "linking" several object files, which are each created by "compiling" a particular source file). A normal build will check the timestamps on all the files that are sitting around, and re-create anything that's either (a) missing or (b) older than the files it depends on (i.e., it detects that one of the "dependencies" has changed, so that the file needs to be re-created). A full build will typically delete everything except your source, and then re-create everything.

Each time you compile a source file, there is a preprocessing phase and a compile phase. The preprocessor scans through the file and interprets all those #define and #include, etc. directives, and actually goes through and does textual substitution on the source accordingly. (e.g. the source file that the compiler sees actually has all of the contents of <iostream> at the top, if you #include that.)

Once the code is in preprocessed form (so that all of those # lines are gone and everything is replaced appropriately), it is fed to the compiler. This preprocessed file makes up one "translation unit", which is compiled into an object file. The object file looks a lot like executable data, except that in some places there will be a function (or variable, or generally any other "symbol") name instead of a raw address. This is because the translation unit included a declaration of the symbol, but no definition. The symbol name is a proxy, indicating that the actual function/variable/etc. will be found in some other object file. (The object file also includes the names of its own symbols, of course.)

So then we get to linking, which is the fun part. The linker takes in several object files, and goes through each of them looking for all of the "unresolved symbols". For each reference in one object file, it goes through all the other object files and looks for the symbol definition there. If it finds that definition in exactly one other object file, then all is well - the linker makes the association between the files permanent. If it didn't find a definition, or if it found two or more, it reports a linking error. If there are no errors, then it will end up with this interconnected web of object data, with no remaining "symbols", which gets output. That's the executable.

I lie a bit. Generally some symbols will remain. These are the ones that refer to DLLs - dynamically linked libraries. And now you know where the name comes from - they provide library functions which are "linked" (in the sense of the linking that your compiler does) "dynamically" (at runtime, as opposed to "statically" at compile-time by the compiler).

(To make sure you get the linking stuff to work out right, you want to read this!)

Share this post


Link to post
Share on other sites

This topic is 4748 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this