• # Organizing Code Files in C and C++

General and Gameplay Programming

# Introduction

While many simple programs fit into a single C or CPP source file, any serious project is going to need splitting up into several source files in order to be manageable. However, many beginning programmers may not realize what the point of this is - especially since some may have tried it themselves and run into so many problems that they decided it wasn't worth the effort. This article should explain why to do it, and how to do it properly. Where necessary, I give a brief explanation of how compilers and linkers work to help understand why you have to do things in a certain way.

## Terminology

In this article, I will call standard C and C++ files (usually with the .C or .CPP extension) "source files". This will be to distinguish them from "header files" (usually with the .H or .HPP extension). This terminology is also used by Visual C++ and most books. Note that the difference is purely conceptual - they are both just text files with code inside them. However, as the rest of the article will show, this difference in the way you treat source and header files is very important.

# Why split code into several files?

The first question some novice programmers ask when they see a directory full of separate code files is, "why isn't it all just in one file?" To them, they just don't see the point of scattering the code about. Splitting any reasonably-sized project up buys you some advantages, the most significant of which are the following:
• Speed up compilation - most compilers work on a file at a time. So if all your 10000 lines of code is in one file, and you change one line, then you have to recompile 10000 lines of code. On the other hand, if your 10000 lines of code are spread evenly across 10 files, then changing one line will only require 1000 lines of code to be recompiled. The 9000 lines in the other 9 files will not need recompiling. (Linking time is unaffected.) [nbsp]
• Increase organization - Splitting your code along logical lines will make it easier for you (and any other programmers on the project) to find functions, variables, struct/class declarations, and so on. Even with the ability to jump directly to a given identifier that is provided in many editors and development environments (such as Microsoft Visual C++), there will always be times when you need to scan the code manually to look for something. Just as splitting the code up reduces the amount of code you need to recompile, it also reduces the amount of code you need to read in order to find something. Imagine that you need to find a fix you made to the sound code a few weeks ago. If you have one large file called GAME.C, that's potentially a lot of searching. If you have several small files called GRAPHICS.C, MAINLOOP.C, SOUND.C, and INPUT.C, you know where to look, cutting your browsing time by 3/4. [nbsp]
• Facilitate code reuse - If your code is carefully split up into sections that operate largely independently of each other, this lets you use that code in another project, saving you a lot of rewriting later. There is a lot more to writing reusable code than just using a logical file organization, but without such an organization it is very difficult to know which parts of the code work together and which do not. Therefore putting subsystems and classes in a single file or carefully delineated set of files will help you later if you try to use that code in another project. [nbsp]
• Share code between projects - The principle here is the same as with the reuse issue. By carefully separating code into certain files, you make it possible for multiple projects to use some of the same code files without duplicating them. The benefit of sharing a code file between projects rather than just using copy-and-paste is that any bug fixes you make to that file or files from one project will affect the other project, so both projects can be sure of using the most up-to-date version. [nbsp]
• Split coding responsibilities among programmers - For really large projects, this is perhaps the main reason for separating code into multiple files. It isn't practical for more than one person to be making changes to a single file at any given time. Therefore you would need to use multiple files so that each programmer can be working on a separate part of the code without affecting the file that the other programmers are editing. Of course, there still have to be checks that 2 programmers don't try altering the same file; configuration management systems and version control systems such as CVS or MS SourceSafe help you here. All of the above can be considered to be aspects of modularity, a key element of both structured and object-oriented design.

# How to do it: The Basics

By now you're probably convinced that there are benefits to splitting up your project into several smaller files. So, how would you go about it? Although some of the decisions you make will be reasonably arbitrary, there are some basic rules that you should follow to ensure that it all works. Firstly, look at how you would split your code into sections. Often this is by splitting it into separate subsystems, or 'modules', such as sound, music, graphics, file handling, etc. Create new files with meaningful filenames so that you know at a glance what kind of code is in them. Then move all the code that belongs to that module into that file. Sometimes you don't have clear module - some would say this should send out warnings about the quality of your design! There may still be other criteria you use for splitting up code, such as the structures it operates upon. (Often "general purpose" functions can be split into string-handling and number-handling, for example.) And occasionally a module could be split into two or more files, because it might make sense to do so on logical grounds. Once you have split it up in this way into separate source files, the next stage is to consider what will go into the header files. On a very simple level, code that you usually put at the top of the source file is a prime candidate for moving into a separate header file. This is presumably why they got termed 'header' files, after all. This code to go in a header usually includes some or all of the following:
• class and struct definitions
• typedefs
• function prototypes
• global variables (but see below)
• constants
• #defined macros

# Potential Pitfalls

The rules given above are fairly vague and merely serve as a starting point for organizing your code. In simple cases, you can produce completely working programs by following those guidelines. However there are some more details that have to be accounted for, and it is often these details that cause novice programmers so much grief when they first start splitting their code up into header files and normal files. In my experience, there are four basic errors that people encounter when they first enter the murky world of user-defined header files.
1. The source files no longer compile as they can't find the functions or variables that they need. (This often manifests itself in the form of something similar to "error C2065: 'MyStruct' : undeclared identifier" in Visual C++, although this can produce any number of different error messages depending on exactly what you are trying to reference.) [nbsp]
2. Cyclic dependencies, where headers appear to need to #include each other to work as intended. A Sprite may contain a pointer to the Creature it represents, and a Creature may contain a pointer to the Sprite it uses. No matter how you do this, either Creature or Sprite must be declared first in the code, and that implies that it won't work since the other type isn't declared yet. [nbsp]
3. Duplicate definitions where a class or identifier is included twice in a source file. This is a compile time error and usually arises when multiple header files include one other header file, leading to that header being included twice when you compile a source file that uses them. (In MSVC, this might look something like "error C2011: 'MyStruct' : 'struct' type redefinition.) [nbsp]
4. Duplicate instances of objects within the code that compiled fine. This is a linking error, often difficult to understand. (In MSVC, you might see something like "error LNK2005: "int myGlobal" (?myGlobal@@3HA) already defined in myotherfile.obj".)
So how do we fix these issues?

# Fixing Problem 1

Luckily, these issues are easy to fix, and even easier to avoid once you understand them. The first error, where a source file refuses to compile because one of the identifiers was undeclared, is easy to resolve. Simply #include the file that contains the definition of the identifier you need. If your header files are organized logically and named well, this should be easy. If you need to use the Sprite struct, then you probably need to #include "sprite.h" in every file that does so. One mistake that programmers often make is to assume that a file is #included simply because another header #includes it for itself.
Example: /* Header1.h */ #include "header2.h" class ClassOne { ... }; /* Header2.h */ class ClassTwo { ... }; /* File1.cpp */ #include "Header1.h" ClassOne myClassOne_instance; ClassTwo myClassTwo_instance;
In this case, File1.cpp will compile fine, as including Header1.h has indirectly #included Header2.h, meaning that File1.cpp has access to the Class2 class. But what happens if, at a later date, someone realises that Header1.h doesn't actually need to #include Header2.h? They can remove that #include line, and suddenly File1.cpp will break the next time you try to compile it. The key here, is to explicitly #include any header files that you need for a given source file to compile. You should never rely on header files indirectly including extra headers for you, as that may change. The 'extra' #includes also serve as documentation, by demonstrating what other code this file is dependent on. So don't try and leave them out if you know you need that header included somehow.

# Fixing Problem 2

Cyclic (or two-way) dependencies are a common problem in software engineering. Many constructs involve a two-way link of some sort, and this implies that both classes or structures know about each other. Often this ends up looking like this:
/* Parent.h */ #include "child.h" class Parent { Child* theChild; }; /* Child.h */ #include "parent.h" class Child { Parent* theParent; };
Given that one of these has to be compiled first, you need some way to break the cycle. In this case, it's actually quite trivial. The Parent struct doesn't actually need to know the details of the Child class, as it only stores a pointer to one. Pointers are pretty much the same no matter what they point to, therefore you don't need to the definition of the structure or class in order to store a pointer to an instance of that structure or class. So the #include line is not needed. However, simply taking it out will give you an "undeclared identifier" error when it encounters the word 'Child', so you need to let the compiler know that Child is a class or class that you wish to point to. This is done with a forward declaration, taking the form of a class or class definition without a body. Example:
/* Parent.h */ class Child; /* Forward declaration of Child; */ class Parent { Child* theChild; };
Notice how the #include line is replaced by the forward declaration. This has allowed you to break the dependency between Parent.h and Child.h. Additionally, it will speed up compilation as you are reading in one less header file. In this case, the same procedure can (and should) be followed in Child.h by forward declaring "class Parent;" As long as you are only referring to a pointer and not the actual type itself, you don't need to #include the full definition. In 99% of cases, this can be applied to one or both sides of a cycle to remove the need to #include one header from another, eliminating the cyclic dependency. Of course, in the source files, it's quite likely that there will be functions that apply to Parent that will manipulate the Child also, or vice versa. Therefore, it is probably necessary to #include both parent.h and child.h in parent.c and child.c.

# Fixing Problem 3

Duplicate definitions at compile time imply that a header ended up being included more than once for a given source file. This leads to a class or struct being defined twice, causing an error. The first thing you should do is ensure that, for each source file, you only include the headers that you need. This will aid compilation speed since the compiler is not reading and compiling headers that serve no purpose for this file. Sadly, this is rarely enough, since some headers will include other headers. Let's revisit an example from earlier, slightly modified:
/* Header1.h */ #include "header3.h" class ClassOne { ... }; /* Header2.h */ #include "header3.h" class ClassTwo { ... }; /* File1.cpp */ #include "Header1.h" #include "Header2.h" ClassOne myClassOne_instance; ClassTwo myClassTwo_instance;

For the purposes of compilation, File1.cpp ends up containing copies of Header1.h and Header2.h, both of which include their own copies of Header3.h. The resulting file, with all headers expanded inline into your original file, is known as a translation unit. Due to this inline expansion, anything declared in Header3.h is going to appear twice in this translation unit, causing an error. So, what do you do? You can't do without Header1.h or Header2.h, since you need to access the structures declared within them So you need some way of ensuring that, no matter what, Header3.h is not going to appear twice in your File1.cpp translation unit when it gets compiled. This is where inclusion guards come in. If you looked at stdlib.h earlier, you may have noticed lines near the top similar to the following:
#ifndef _INC_STDLIB #define _INC_STDLIB
And at the bottom of the file, something like:
#endif /* _INC_STDLIB */
This is what is known as an inclusion guard. In plain English, it says "if we haven't defined the symbol "_INC_STDLIB" yet, then define it and continue. Otherwise, skip to the #endif." This is a similar concept to writing the following code to get something to run once:
static bool done = false; if (!done) { /* Do something */ done = true; }
This ensures that the 'do something' only ever happens once. The same principle goes for the inclusion guards. During compilation of File1.cpp, the first time any file asks to #include stdlib.h, it reaches the #ifndef line and continues because "_INC_STDLIB" is not yet defined. The very next line defines that symbol and carries on reading in stdlib.h. If there is another "#include " during the compilation of File1.cpp, it will read to the #ifndef check and then skip to the #endif at the end of the file. This is because everything between the #ifndef and the #endif is only executed if "_INC_STDLIB" is not defined; and it got defined the first time we included it. This way, it is ensured that the definitions within stdlib.h are only ever included once by putting them within this #ifndef / #endif pair. This is trivial to apply to your own projects. At the start of every header file you write, put the following:
#ifndef INC_FILENAME_H #define INC_FILENAME_H
Note that the symbol (in this case, "INC_FILENAME_H") needs to be unique across your project. This is why it is a good idea to incorporate the filename into the symbol. Don't add an underscore at the start like stdlib.h does, as identifiers prefixed with an underscore are supposed to be reserved for "the implementation" (ie. the compiler, the standard libraries, and so on). Then add the #endif /* INC_FILENAME_H */ at the end of the file. The comment is not necessary, but will help you remember what that #endif is there for. If you use Microsoft Visual C++, you may have also noticed 3 lines in stdlib.h similar to:
#if _MSC_VER > 1000 #pragma once #endif
Other compilers may have code similar to this. This effectively achieves the same thing as the inclusion guards, without having to worry about filenames, or adding the #endif at the bottom of the file, by saying that "for this translation unit, only include this file once". However, remember that this is not portable. Most compilers don't support the #pragma once directive, so it's a good habit to use the standard inclusion guard mechanism anyway, even if you do use #pragma once. With inclusion guards in all your headers, there should be no way for any given header's contents to end up included more than once in any translation unit, meaning you should get no more compile-time redefinition errors. So that's compile-time redefinition fixed. But what about the link-time duplicate definition problems?

# Fixing Problem 4

When the linker comes to create an executable (or library) from your code, it takes all the object (.obj or .o) files, one per translation unit, and puts them together. The linker's main job is to resolve identifiers (basically, variables or functions names) to machine addresses in the file. This is what links the various object files together. The problem arises when the linker finds two or more instances of that identifier in the object files, as then it cannot determine which is the 'correct' one to use. The identifier should be unique to avoid any such ambiguity. So how come the compiler doesn't see an identifier as being duplicated, yet the linker does? Imagine the following code:
/* Header.h */ #ifndef INC_HEADER_H #define INC_HEADER_H int my_global; #endif /* INC_HEADER_H */ /* code1.cpp */ #include "header1.h" void DoSomething() { ++my_global; } /* code2.cpp */ #include "header1.h" void DoSomethingElse() { --my_global; }
This first gets compiled into two object files, probably called code1.obj and code2.obj. Remember that a translation unit contains full copies of all the headers included by the file you are compiling. Finally, the object files are combined to produce the final file. Here's a visual depiction of the way these files (and their contents) are combined:

extern int my_global;
The 'extern' specifier is like telling the compiler to wait until link time to resolve the 'connection'. And for a function, you just put the function prototype:
int SomeFunction(int parameter);

# Conclusion

If done properly, organizing your files this way can speed your development without hindering you at all or introducing any extra problems. Although it seems like there is a lot to consider when doing so, it becomes second nature in no time. Just remember these key points:
• Split your code up along logical module divisions
• The the interface into the header files and the implementation in the source files
• Use forward declarations wherever possible to reduce dependencies
• Don't put code in the headers that might adversely affect source files that include it Good luck!

Report Article

## User Feedback

What a great article!! Thanks so much to Ben and all those who revised later on..

##### Share on other sites
The best tutorial about this subject(and the only I thinkq many books don't cover this). Thanks helped me a lot.

##### Share on other sites

Thanks, this helped me a lot

##### Share on other sites

Great and very helpful article. Only one thing you left unexplained, that's part of the One Definition Rule(ODR). Some things, like types, templates, and extern inline functions, can be defined in more than one translation unit. For a given entity, each definition must be the same. Non-extern objects and functions in different translation units are different entities, even if their names and types are the same. Again, I think this is the best article on this topic one can found on the net. Congratulations.

##### Share on other sites

Agreed, a very nice article, sums it up nicely!