C++ : question about classes and .obj files

Started by
5 comments, last by Agony 17 years, 8 months ago
I'm reading a C++ book chapter about classes. The author goes through the concept of class reusability with the following example: he splits the whole .cpp in two files, a header (that contains the class definition and function prototypes), and a .cpp file that contains all the class members. He says, but without any example, that is possible to let other programmers use this class just providing the header and the .obj file derived from the .cpp linking. This is done to prevent "giving away" your own source code, yet letting the possibility to use the class. I don't know how that .obj file can be used with a Visual Studio project in order to access its class members. Any idea ? Thanks for any help.
Advertisement
Turning C++ code into a program requires two steps. Compiling turns the C++ code into independent blocks (the *.obj). Linking the resolves the references between object files (also introducing library contents).

Thus, you would provide the compiler with a header that declares the contents of your class (so that you write code using that class), and provide the linker with the object file (so that the contents of the class are available in the final program).
In fact, you do it all the time when you work with strongly 'classified' code. When you recompile your project and all you changed is a single .cpp file, you'll see only that file gets recompiled: only that .obj file needs to be recreated. The linker then takes care of the rest.

To access class members, all you need to know is the interface, e.g. how it looks, and that's completely described in the .h file. Simply include that file where you need it and you're ready to go. In order to call a function, you don't need to know what it does. You just need to know how to call it.
As a matter of speaking, header files are the user manuals of .obj files... :)
Create-ivity - a game development blog Mouseover for more information.
Thanks for your help but i'm still confused.
Say i have a Visual C++ project containing the following files:

main.cpp
class1.cpp
class1.h

If i send you my class1.h and class1.obj files are you able to create your own Visual C++ project, with your own main() to access class members ?
And of course are you able to link the program again ?
Generally this is done with either a static or dynamic library, rather than a raw .obj file. A static library (.LIB in windows) is compiled into your application at link time a dynamic library (.DLL) is loaded at runtime.

So you "distribution" (the thing you give to developers who want to use you code) consists of header file (or, generally, several header files) and library. They can use your code but if they step into one of your functions in .net then will only see assembler. You can even go further and hide the contents of your class or structure entirely by doing something like this:

In your header:
typedef class Foo FooID; //The definition of class Foo is in the CPP where people can't see itFooID  CreateFoo();void   DoStuff(FooID pFoo);..


But this will break the OO structure of your code (as you can't call methods on an undefined class) and personally it annoys me when developers do this.


This does not 'break the OO structure'. Member function != object orientation.
I'll take a swipe at this.

First of all, an explanation of how .cpp and .h files get used to produced .obj files, and how those are then used to create .exe files.

As has been mentioned, building your .exe consists of two steps: compiling and linking. During the first step, compiling happens once for each of your .cpp files. The process of compiling takes one .cpp file and turns it into one .obj file. So after all the compiling is done, you will have one .obj file for each .cpp file you have.

The process of compiling goes like this: The .cpp file is loaded into memory. In addition, every .h file that it #includes is loaded into memory as well, and insert straight into the in-memory representation of the .cpp file, right at the point where the .h file is #included. Once all of the .h files are blindly inserted, you basically have a really long file in memory. This file is then compiled, and turned into a .obj file, which is mostly machine code. Think of it as a fragment of a .exe file.

Given your three files, main.cpp, class1.cpp, and class1.h, let's consider what happens when main.cpp is compiled. Let's assume that near the top, it #includes "class1.h". When loaded into memory, you basically end up with all the code for main.cpp, with everything from class1.h inserted at the top. So while compiling main.cpp into main.obj, the compiler doesn't know how class1 works, but it knows what it looks like. main.obj will end up having machine code that is the executable form of the source code in main.cpp, but it doesn't have any executable machine code for class1 (under normal circumstances).

However, code in main.cpp does have to use class1. So when compiling main.obj, how does the compiler deal with this? It deals with it by not dealing with it; it says, "I don't know how to execute this code that main.cpp is telling me to execute, because the source code is somewhere else (I don't know where). However, I do know that the variables are of this type, and that the functions take this many and these types of parameters, so I can say that I am using this stuff, without actually saying where this stuff is."

At this point, this executable machine code is quite incomplete. That's why I described .obj files as fragments. How do we resolve this to get the final product? We use the linker.

The linker goes through, takes all .obj files that were produced during compiling, and links them together. For any given .obj file, it will go through and look for those variables and functions that the compiler couldn't do anything with, and fix it. The compiler knew the names and types of the stuff it was using, just not the code. When the linker finds an unlinked variable or function call, it can take the name, search through all the available .obj files, and if it finds an .obj file that actually defines the details for that name, it can link the first .obj files reference of this name to the actual details in the second .obj file.

So let's assume that we have main.obj and class1.obj. main.obj knows that there is a function somewhere that is a part of some class with the name "class1", and this function has the name "do_stuff". main.obj also knows that this function takes no parameters, and returns an int. The linker starts looking through the .obj files for a function that matches these details. It doesn't find one in main.obj, but it does find one in class1.obj. So the linker fills in the missing pieces in main.obj, and that part of main.obj's code becomes complete. Eventually, all the missing pieces are (hopefully) filled, and the final result is a complete .exe file.

Let's consider, however, the situation when we have main.cpp, class1.obj, and class1.h (no class1.cpp). class1.obj is simply incomplete machine code, so we can't read it, unless we want to very painfully wade through a bunch of computer-generated machine code (we don't, ug!). But it contains all of the executable stuff that was defined in class1.cpp, so it theoretically has everyting we need.

So when doing the compiling stage, we do each .cpp file. But we only have one, main.cpp. It includes class1.h, and so it knows the names and types of various things that are in class1.cpp, or in this case, hidden away within class1.obj. As explained above, main.cpp doesn't need to know how this stuff works, just what it looks like. So then we get main.obj, which has missing pieces. Those missing pieces should hopefully exist in class1.obj. (If they don't, you'll get linker errors complaining about undefined stuff. The compiler was hoping that later on, the linker would be able to find stuff, but when it came time for the linker to actually do the finding, it failed.) At this point, we're in the exact same stage as we were above. We have a bunch of .obj file, and we need to link them together. So the linker doesn't even know that there was any difference. In fact, the compiler wasn't aware of any difference either, other than the fact that there was only one .cpp file in the project.

Of course, .obj files are usually a hidden step that the programmer doesn't have to worry about. He hits the "build project" button, and source code is turned into an executable. Anything in between is irrelevant. So because of that, .obj files aren't usually very easy to manually work with and use. Instead, we have statically and dynamically linked libraries. On Windows, at least, these are .lib and .dll files, respectively. A .lib file can be considered nearly identical to a .obj file. The contain fragments of code that refer specifically to the source files they were compiled from, and they are added to the mix during the linking stage of building. A .dll file is a little trickier, because it needs to be capable of being loaded when the program runs. This means that executables that use .dll files often have missing pieces themselves (just like .obj files). (This depends on the nature of how one uses DLLs, though.) If the .dll file can't be found when the program using it is executed, then the program simply can't continue; it quits with an error message.

Anyway, I'm getting off track a bit. Hopefully this explanation will help some. As I just told someone else in another thread, I have found that understanding the process of compiling and linking helps to make the solutions to many problems much more obvious. Things fall into place and make sense once you understand this process. This includes how one should divide up their code into .cpp and .h files, why various linker errors pop up, why inline functions and template code needs to be in the .h file, not the .cpp file, and so forth.
"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke

This topic is closed to new replies.

Advertisement