Sign in to follow this  
JWalsh

C++ Workshop - C++ Keywords, Variables, & Constants (Ch. 3)

Recommended Posts

Welcome to the GDNet C++ Workshop – Ch. 3

For a complete introduction to this workshop, please look here. Workshop Overview This workshop is designed to aid people in their journey to learn beginning C++. This workshop is targeted at highly motivated individuals who are interested in learning C++ or who have attempted to learn C++ in the past, but found that without sufficient support and mentoring they were unable to connect all the pieces of this highly complex but powerful programming language. This is a 'guided' self-teaching C++ workshop. Each student is responsible for taking the time to read the material and learn the information. The community and tutors that arise out of this workshop are here for making the learning process run more smoothly, but are not obligated to baby-sit a person's progress. Because everyone will be working from the same textbook (Teach Yourself C++ in 21 days 5th Ed.), students may find it easier to get answers to the specific questions they might have. There is no minimum age requirement, and there is no previous programming experience required. Additionally, this workshop does not attempt to defend C++ as a language, nor does it attempt to demonstrate that C++ is either more or less useful then other programming languages for any particular purpose. People who intend to start a discussion about the differences between C++ and ANY other languages (except as are relevant to a particular discussion), are encouraged to do so elsewhere. This workshop is for educational, not philosophical discussions. Quizzes & Exercises Each week will have quizzes and exercises posted in the weekly threads. Please try and answer them by yourself. As well, please DO NOT post the answers to Quizzes and Exercises within this thread. Once it becomes acceptable to post the answers to quizzes and exercises, an additional thread will be created each week specificaly for the purpose of posting quiz answers. If you try with reasonable effort but are unable to answer the questions or complete the exercises, feel free to post a clarification question here on the thread. Tutors, myself, or others will do the best we can to point you in the right direction for finding the answer.

Chapter 3 – Working with Variables and Constants

Introduction Greetings! This week we will be covering chapter 3 on variables and constants. The chapter is approximately 20 pages not including the summary, Q&A, and quiz questions at the end of the chapter. Roughly half way through the week myself, tutors, or anyone else simply wishing to challenge their teammates learning C++ can post review and quiz questions in this thread. Please do not post the answers in this thread however, as a new thread will be created for that purpose. This is a shorter week, only 7 days instead of 10 and at the beginning of the next week (Monday morning) we will again move on, so try and keep up. Participants are welcome to post their questions for this chapters within this thread and myself, the tutors, and other participants will do the best we can to answer your questions. As the thread is likely to grow to a hundred or more posts, the C++ workshop threads will be closely moderated. Discussions which become narratives, flame-wars, or philosophical will either be removed or moved to another forum, unless entirely relevant to the current chapters. Finally, feel free to post quiz-like questions here in this thread after about 3 days. This will give people an opportunity to test their knowledge and understanding after they’ve had a chance to absorb the information. For questions which require further research then what is in the book, mark the question with an [Extra Credit] tag. Topical Outline of the Reading (Not literal due to copyrights)
  1. Exploring the parts of a variable
  2. How Data is stored in memory
  3. Looking at size and range modifiers
  4. Exploring the standard C++ data types
  5. The nuances of creating variables
  6. C++ keywords which cannot be used as identifiers
  7. Working with variables
  8. Aliases
  9. Overflow/Underflow & Range limitations
  10. Interpreting integers as characters
  11. Special Characters
  12. Using constants
  13. Using enumerations
[Edited by - jwalsh on May 30, 2007 5:28:49 PM]

Share this post


Link to post
Share on other sites
Hello all,

Quick question about literal constants. I've read the section in the book and I've become a bit confused. The following code:

int myAge = 39;

according to the book, is a literal constant. It also states that I can't change this value. Yet just eight pages earlier it uses this exact same method for assigning values to variables (or, at least, I think it's the same...???)

I'm sure I'm just missing something here. How exactly is this different from creating a variable and assigning a value to it?

Share this post


Link to post
Share on other sites
The literal constant in the above line of code is the "39". It is a constant in that its value cannot be changed, and it litterally stands for the value 39. myAge can be assigned another value. If you wanted to permanently assign 39 to my age, you would write

const int myAge = 39;

The const keyword is short for "constant". If you tried to assign a new value to myAge after it had been declared constant, you would recieve a compile-time error.

Share this post


Link to post
Share on other sites
I saw that too, about the literal constant.

I would like that clarified as well, but from what I could understand they were referring to the number 39 being a literal constant, not myAge.

Share this post


Link to post
Share on other sites
Correct. 39 is the literal constant. You can't change the value of 39. You can however change the value of myAge, unless it has been declared const, as programwizard pointed out.

[warning]
More importantly, string literals like "Hello World" also are literal constants. Trying to modify them is a very common error. You will hear more about it when we discuss pointers, arrays and strings.
[/warning]

Share this post


Link to post
Share on other sites
Hi,
I have a question regarding variable types. I understand how they work for numbers ( either int float or the others ) but what about letters ? char can store one character, a number or a letter but only one... How can I ask the user to input his name via cin>> ? With char Myname, Myname will only store the first letter of the name.

Thanks.

Share this post


Link to post
Share on other sites
Quote:
literal:
  • actual: being or reflecting the essential or genuine character of something; "her actual motive"; "a literal solitude like a desert"- G.K.Chesterton; "a genuine dilemma"

  • without interpretation or embellishment; "a literal depiction of the scene before him"

  • limited to the explicit meaning of a word or text; "a literal translation"

  • avoiding embellishment or exaggeration (used for emphasis); "it's the literal truth"

int myAge = 39;

myAge can not be a literal, even when it is made const, for the simple fact that it is a variable. Looking at it in source code does not tell you its value; you must evaluate it first.

C has a pair of interesting terms used to refer to the two sides of an expression. An lvalue (literally, "left value") is any object that can appear on the left hand side of an assignment while an rvalue is one that can appear on the right hand side. Virtually all lvalues are also rvalues; literals are objects that can only ever, throughout the entire program code, be rvalues. Even const objects appear as lvalues at point of initialization.

Share this post


Link to post
Share on other sites
Quote:
Original post by Myotis:
Hi,
I have a question regarding variable types. I understand how they work for numbers ( either int float or the others ) but what about letters ? char can store one character, a number or a letter but only one... How can I ask the user to input his name via cin>> ? With char Myname, Myname will only store the first letter of the name.

Thanks.


C++ has a built-in string class for handling strings of characters. You only have to include the string header file:

#include <iostream>
#include <string>
using namespace std;

int main()
{
string myName;
cin >> myName;
cout << myName;
return 0;
}

This will allow the user to input a series of characters, and then print the result. Once you include the string header file, you can use string like a normal data type (note that if you don't use the namespace std, you will need to declare strings as std::string).

Share this post


Link to post
Share on other sites
Quote:
Original post by Myotis
Hi,
I have a question regarding variable types. I understand how they work for numbers ( either int float or the others ) but what about letters ? char can store one character, a number or a letter but only one... How can I ask the user to input his name via cin>> ? With char Myname, Myname will only store the first letter of the name.

Thanks.

Fruny has intimated that this will be covered in significant detail soon, but here's the quick answer:

C supports the notion of arrays, though not as a true first-class type. In C, strings are represented using null-terminated character arrays, meaning regular arrays of char with a null character (0, '\0') signifying the end of the string. Unfortunately, this requires that you constantly monitor the size of your string, maintain the length of your string separately from the string itself and otherwise babysit everything pertaining to strings, thus making string handling a rather tedious affair in C.

Standard C++ supplants this by providing the std::string type. Internally, std::string uses arrays, but there is no need for them to be null-terminated as a std::string stores its own length. Being a class, it also provides a number of member functions and operator overloads that make using strings in C++ intuitive and secure.


// C language example of string handling
#include <stdio.h>
#include <string.h>

char name[1024];
int length = 0;

int main()
{
printf("Please enter your name: ");
scanf("%s", &name);
length = strlen(name);

// Comment about the following line after the examples
printf("The name you entered was: %s\n"
"The length of the name was: %d characters\n", name, length);

return 0;
}


// C++ language example of string handling
#include <iostream>
#include <string>

int main()
{
using namespace std;
string name;

cout << "Please enter your name: ";
cin >> name;

cout << "The name you entered was: " << name
<< "\nThe length of the name was: " << name.length()
<< endl;

return 0;
}



Two interesting things about the C example. One is that C's output functions required you to know the appropriate format specifier (%s, %d, etc) for the type of data you wanted to output as part of the same printf statement. Blech.

The second is that I have two string literals that are basically adjacent to each other (whitespace is meaningless to the C and C++ compilers), and C concatenates them into one string literal. It's an easy way to make your programs prettier. [smile]

(Okay, that may have been too much for the beginning stages of this workshop...)

Share this post


Link to post
Share on other sites
I need to get my head around some terminology. This doesn't relate specifically to this week and I wasn't sure if I should put this into the "Introduction" thread. I know that it will come up eventually but wonder if a simple structure can be offered at this stage.

We've worked extensively with the iostream header file and that's needed to allow us to access cin, cout etc. but we also need the standard namespace to allow us to access them. It strikes me that there is a hierarchy, starting with C++ Language at the top but what's the order beneath that? There are namespace, objects, classes, header files etc. so, for instance, is the standard namespace part of the iostream header file or the other way around? How do they relate to each other?

A further question relates to other header files (as mentioned by Oluseyi): if iostream is needed to access cin, cout etc., where can a list of all available commands or keywords be obtained, along with their respective header file? In order to use "string name;" in the source code, "#include <string>" must precede it. Sure, this will come with with experience and familiarity with the language but I suspect that there's a list somewhere.

Thank you.

Share this post


Link to post
Share on other sites
Quote:
Original post by CondorMan
It strikes me that there is a hierarchy, starting with C++ Language at the top but what's the order beneath that? There are namespace, objects, classes, header files etc.

Namespaces, classes, structs, etc. are all elements of the C++ language, not layers upon which it is built. Header files, namespaces, and the like are all methods of grouping similar pieces of code/data todether. For example, you might have a header file containing all your function defenitions needed to render an object to the screen, which would all be wrapped in a class along with the necessary data members. This article has some good information on why/how header files are used to organize code.

Quote:
Original post by CondorMan
so, for instance, is the standard namespace part of the iostream header file or the other way around? How do they relate to each other?

<iostream>, as well as all other members of the C++ standard library (see link below), are in the std namespace.

Quote:
Original post by CondorMan
A further question relates to other header files (as mentioned by Oluseyi): if iostream is needed to access cin, cout etc., where can a list of all available commands or keywords be obtained, along with their respective header file? In order to use "string name;" in the source code, "#include <string>" must precede it. Sure, this will come with with experience and familiarity with the language but I suspect that there's a list somewhere.


MSDN holds all the answers

[Edited by - Driv3MeFar on June 13, 2006 8:07:23 AM]

Share this post


Link to post
Share on other sites
Thank you. I'll check out the sites that you quoted. I'll go along with everything now and learn it verbatim, knowing that everything will crystallise in due course.

It's interesting that you said:

"<iostream>, as well as all other members of the C++ standard library (see link below), are in the std namespace."

so that reinforces the query that I had - if <iostream> is in the std namespace, why do both have to be mentioned in a listing which uses cout, cin etc.? Logic says to me that I should only have to mention std namespace as it seems "higher" than <iostream>. Having said that, I've never seen any source code which doesn't have one or more header files.

I don't want to get into detailed discussion now because that may well serve to confuse me (and, perhaps, others).

Share this post


Link to post
Share on other sites
Quote:
Original post by CondorMan
so that reinforces the query that I had - if <iostream> is in the std namespace, why do both have to be mentioned in a listing which uses cout, cin etc.? Logic says to me that I should only have to mention std namespace as it seems "higher" than <iostream>.


The header file iostream contains a namespace within it.

// -- within iostream header file somewhere -- //

namespace std {
// variables like cout and cin are in here
}

// ------------------------------------------- //

This means if you include iostream you still need to scope in std. It's a little confusing because all members of the iostream library are within the standard namespace, but the header file "iostream" contains the standard namespace which also contains the iostream library.

iostream header file -> namespace std -> iostream library

Hopefully that makes a little sense.

- Dave

Share this post


Link to post
Share on other sites
CondorMan,

All good questions. Let me see if I can break it down a bit.

The C++ Language is a programming language, much like French or Spanish are natural languages. Both languages have semantics and to a degree syntax. The semantics of a language is what "features" it provides. By comparison, Spanish and French both have the same semantics (conjugations, sentence structure, gender, etc...), but as the vocabulary is different between the two languages they have a slightly different Syntax.

C++ shares common semantics with most modern object oriented languages including Java, C#, J#, javascript, PHP, etc...And to a degree, it shares the same syntax with many of them as well - because they're all ideas spawned from the same root - The C Language. Namespaces are just one of the "features" or "semantics" of most common OO languages.

Part of the argument for an object oriented language is being able to group data with functionality and code-reuse. The ability to build a library of reusable software rather than having to write everything from scratch is the dream of every software engineer. We imagine a world where everything we could want to create is simply a matter of plugging components together without the need to write any more of the "low level" stuff. To that end, the C++ ISO Board proposed a set of standardized libraries that every ISO compliant compiler must provide. How they implement the libraries is flexible, but the components that make up the library must be consistent. This is of course the C++ Standard Library.

When working within the C++ language the compiler must know at compile-time what the declarations look like for all the functions, classes, methods, etc...you want to use. In last week's thread I posted a really good explanation of namespaces, so I wont duplicate here. It basically comes down to this, namespaces help the compiler decide which Identifier to use, in case there is more than one with the same name. In other words, think of namespaces as a Surname. There are TONS of Michaels in the world, but I want to address a specific Michael, lets say Michael WALSH. Namespaces accomplish the same goal. I could create my own cin, cout, etc...but by telling the compiler I want to use the Standard Namespace with 'std' I'm telling it WHICH cin and cout to use.

std::cin as opposed to jwalsh::cin.

As was a part of last week's quiz, there's 3 ways to identify WHICH identifier I want to use.


std::cin; // fully qualified, us std::cin in this instance only
using std::cin; // Use the STD version of cin for the rest of this scope
using namespace std; // Look within the STD namespace for all classes for the rest of this scope


As for why we include different header files. Its pretty simple. The concepts of namespaces lies across the concept of a file. I can have a file with 12 different namespaces in it...but I can also have 12 different files all with code which is included in the same namespace. Files are a physical division of a library, while namespaces are a logical division of a library. Two different ways upon which to divide a library, and both necessary for identifying what to include in your program, and which version to use.

In the case of the Standard Library, there's about 50 different files, each which contains useful classes, etc...But I don’t want the compiler to include them ALL....that would be a performance nightmare, many of the library components are unnecessary for most tasks. So to keep my code small and program running quickly, the compiler allows me specify WHICH of the components of the library I want to use at any specific moment. To do that I simply include the header file.

Then, once I've included the header file I need to tell the compiler that I want to use the namespace contained WITHIN that file. Remember - there could be multiple namespaces within a single header...and there could be multiple versions of cin, cout, or whatever is IN the file I just included lying around my project.

I hope this helps, let us know if you have any follow-up questions.

Cheers!

Share this post


Link to post
Share on other sites
Quote:
Original post by CondorMan
So that reinforces the query that I had - if <iostream> is in the std namespace, why do both have to be mentioned in a listing which uses cout, cin etc.? Logic says to me that I should only have to mention std namespace as it seems "higher" than <iostream>. Having said that, I've never seen any source code which doesn't have one or more header files.

I don't want to get into detailed discussion now because that may well serve to confuse me (and, perhaps, others).

The above answers to your question are very good. I feel to clarify what the #include directive does. The #include directive is handled by the preprocessor before the C++ compiler proper ever sees your code. The preprocessor sees the #include, opens up the text file it points to (in this case, a file called "iostream") and pastes the entire contents of that file in the place of the #include directive.

Namespaces are part of C++ itself, and are handled by the compiler after all of the #include's and other preprocessor directives are done and over with.

Hope that helps.

Share this post


Link to post
Share on other sites
Thank you for the detailed responses to my questions. I've read them a few times and it's starting to make sense.

Here's to the remainder of this week's work and to the forthcoming quiz!

Share this post


Link to post
Share on other sites
CondorMan,

Are you confused by C++ namespaces or by the C++ way of using the using keyword?

I'm asking this because of this sentence in your post:
Quote:
Logic says to me that I should only have to mention std namespace as it seems "higher" than <iostream>


One way to understand the using keyword lies in one of the premise of the C++ language:

Thou shall not use any undeclared symbol


As BeanDog told you, the #include preprocessor directive is used to paste the content of a file into another file. In a sense, it imports the declarations that are in the included file into the includer file. You can see the output of the preprocessor by doing this (.NET 2003, but should be very similar ofr .NET 2005):
  1. go to the project properties
  2. in the C++ properties, chose 'Command Line'
  3. in the Additional Options text bos, add
    /P / EP
    (/P = write preprocessor output, /EP = don't write #line directives in the preprocessor output)
  4. Click OK
  5. Compile your project (the project will not link, because the object files are NOT created)
  6. go to your source (probably Debug or Release)
  7. the preprocessor output have the .i extension


Don't forget to revert this project options changes if you want to be able to successfully compile and link your project. Be aware that the generated files are rather big (often several thousands of lines).

The #include directive has no real impact on the compilation - in fact, strictly speaking, it takes place before compilation, in a step called preprocessing. The goal of preprocessing is to prepare a compilation unit (the resulting .i file) to be compiled. In fact, #include is a helper because it allows you to put the declarations that might need to be used in more than one C++ file into a single file (a header file - most of the time they have a .h extension, but some of them (those of the Standard C++ Library) may have no extension). Then, instead of writing the declaration in each C++ file, you simply include the corresponding header file.

This way, you can import the declarations you need in your compilation unit.

Now that all needed symbols are known, we are able to use them. Before I continue, let speak a bit about namespaces and their usage.

Namespaces are relatively new beasts in software engineering. The current trend is to use namespaces to create packages of classes that share a same goal (for example, providing a standard library to C++ programmers). You can even add subnamespaces (to create subpackages).

The strict usage of fully qualified names may lead to source code that is difficult to read - and difficult to maintain / modify. This is expecially true when namespace names are long or when you are using symbols that are declared in subnamespaces. For example:
company::database::clientmanager::Client *client = 
new company::database::clientmanager::Client(
company::database::clientmanager::ClientType::VERY_RICH);

Now that I wrote this horror, the goal of the using keyword should become clearer: while namespace are really useful because they help the definition of clearly bound units, they can lead to barely readable or even unreadable source code, which is never a good thing. the using keyword allows you to simplify the code when you need to use symbols that have been declared in a namespace.
using company::database::clientmanager;

Client *client(ClientType::VERY_RICH);

This is easier to read, isn't it?

You may ask: why did he speak about #include, then namespaces, then using? That's because I feel that you misunderstood the roles of #include and using: while you beleive that they are related, in fact they are not related at all (they can't be related, because they are used in different steps of the whole compilation process). As BeanDog clearly stated, #include is a copy-paster that allow you to declare your symbols in one file and to use this declaration in many classes (hence to satisfy the first part of the C++ premise that I stated). The using keyword allows you to define a new, simpler way, to gain access to these declared symbols (hence to satify the other part of the same C++ premise).

In your example, #include <iostream> is used to import the declarations of the streams symbols, and using std; is used to simplify the access to the symbols that have been imported and that are in the "std" namespace.

I hope I have been clear enough ;)

Regards,

Share this post


Link to post
Share on other sites
Hi Emmanuel

Thank you for your contribution. I hope that this part of the thread is also helpful to anyone else who's having trouble understanding the terminology.

I understand "#include" and "using".

My original concern was when constructing typical "Hello World!" source code. I realised that *both* #include <iostream> and using namespace std; (or one of the equivalent variations) was necessary but I didn't know why. I assumed that there was a "hierarchy" - for instance, I assumed that having #include <iostream> pasted the contents of iostream into a space above my source code and then, as ph33r said, iostream contains std, so why declare (possibly not the correct word, but not used in the strict programming sense of "declare") the use of std? Jeromy said that he could create several namespaces, each having cin, cout etc. and I didn't know that was possible. I understand now why it's necessary to be explicit in wanting to use std::cin, std::cout etc.

As I've said before, this will crystallise in due course. I have NO intention of letting a computer beat me!

Share this post


Link to post
Share on other sites
C++ beginners often have problems to see when they should use the "declare" word or the "define" word, so you are not alone. As I already stated in the very first thread of this workshop, I don't own the book so I don't know if it contains a simple definition of these words.

Let's remember what a C++ compiler does:
  1. first, it preprocess the C++ file
  2. then it compiles the preprocessed file
  3. then it links all the compiled C++ files into one big executable file


The goal of the declaration of a C++ symbol is to tell the compiler that the symbol exists somewhere. Essentially, it says "this symbol exists somewhere, you don't have to know eaxactly where it is, and it has that name and that signature". Once a compiler knows every symbol that is used in a particular C++ file, it can compile the file and produce the object (.o or .obj) file.

The goal of the definition is tu put something behind the symbol itself.

Let's see an example:

// ----------------------- file1.h
#ifndef FILE_1_H
#define FILE_1_H

// this is the function ** declaration **
int function_plus(int a, int b);

#endif // FILE_1_H



// ----------------------- main.cpp
#include <iostream>
// including file1 will declare the existing symbol function_plus()
// so we can use it in this compilation unit
#include "file1.h"

int main()
{
std::cout << "10 + 20 = " << function_plus(10, 20) << std::endl;
}



If I stop my work here, what will happen? main.cpp will correcly compile (function_plus() is known), but what should function_plus() do? The compiler have no way to know about this problem (since you specifically told him that 'function_plus() exists somewhere') but the linker - which goal is to produce the final exe - will complain because of an undefined symbol ie an existing, declared symbol that has no real existence.

To correct the problem, let's add the function_plus() definition to correct the problem:

// ----------------------- file1.cpp
#include "file1.h"

int function_plus(int a, int b)
{
return a + b;
}



The couple definition/declaration is difficult to get for a beginner (I had the exact same problems when I began C++ some years ago - the problem was even worse for me because I came from a C background and the C language don't require function declarations).

At this point, I believe that the most important thing to remember about declaration and definition is that
  • the declaration is vital for the compiler
  • the definition is vital for the linker
  • an "undeclared symbol" compilation error means that the symbol is unknown to the compiler (check why)
  • an "undefined identifier" linker error means that the symbols is known by the compiler, is corectly used, but don't really exists.
  • a "multiple definition" linker error means that a symbol has been defined more than once in the whole project.


I guess I'm going to try to find a copy of this book on ebay. I feel that I'm not helping very much if I don't know what is the exact subject.

Regards,

Share this post


Link to post
Share on other sites
Quote:
Original post by Emmanuel Deloget
Let's remember what a C++ compiler does:
  1. first, it preprocess the C++ file

  2. then it compiles the preprocessed file

  3. then it links all the compiled C++ files into one big executable file

The preprocessor "preprocesses" the file. The compiler compiles the preprocessed file. The linker links the generated object files.

What you have referred to is the toolchain, not just the compiler. The MSVC++ command line compiler, cl.exe, will invoke the linker, creatively named link.exe, by default unless you suppress that behavior with a flag. The preprocessor for MSVC++ has been integrated into the compiler binary.

GCC, on the other hand, still maintains a distinct C preprocessor binary, cc, as well as a C++ compiler, g++ and linker, ld. g++ will call cc and ld automatically. (We ignore other compilers included in GCC as this is a C++ workshop.)

Pedantic, I know, but since we're discussing terminology, we might as well get it right. Also, Emmanuel, you should close your <li> tags in order to be valid XHTML. For future reference. [smile]

Share this post


Link to post
Share on other sites
Emmanuel Deloget

First I have to say I appreciate what everyone is doing here. I have one question about your post before Emmanuel. You have three files, main.ccp, file.h, and file.cpp.

I was wondering what the recommended way to include files is especially if you have a lot of files.
//Lets say I have a main.h

#ifndef _MAIN_H
#define _MAIN_H
#include <iostream>

#include "file.h"
class main
{
void run();
};
#endif







Then my main.cpp would look like this


#include "main.h"
main::run(){
//Code here
};





Now lets take and say file.cpp
I usually do it this way so I have access to all other included files.

#include "main.h"

int somefunction()
{
//Code here
}





I've found that this prevents a lot of my preprocessor errors. The main thing I have found is that the order they are included matters, but I have never had any other problems doing it this way. I typed this in here, so there may be some errors, but I hope you understand my question. In file.h I wouldn't have any includes.

What I think is happening is that the declaration for file.h is being posted in main, then all of that is being posted in file because of the include in file.

Share this post


Link to post
Share on other sites
The topics of headers and when to include them is an often misunderstood concept. I find that the best way to understand them is to have a deeper knowledge of declarations vs. definitions, and what happens during the compile and link stages of building an application. So let me go over those things...

What is the compiled unit of a C++ program?

First, people are often confused about what the compiled unit is in a C++ program. Is it the header file, the source file, or both? The answer is: The Source File.

When your compiler is instructed to build your application it looks through its project settings for any source (*.cpp) files and attempts to compile each of them as a separate entity. The symbols identified within that source file exist only within the scope of that file.

Additionally, your compiler has no problem building applications which may not have a single header file declared in the project. Finally, many IDE's such as Visual Studio, etc...allow you to include header files in the project. Doing this causes the header file to show up in the solution explorer for easy access, but it's important to note that unless the header file is actually "#include"'d in a *.cpp file, then its contents aren't actually contained within any object.

What is necessary for a successful compilation, and what is created?

As we learned in week 1, the first stage of the two-stage process of building an executable is the "Compile" stage. In this stage, your compiler turns each source file into an object file. In order to do this it parses your file, checks for correct syntax, looks for well formed, matching identifiers, etc...and then converts it into a sort of semi-final binary object.

As part of the process the compiler tries to match each identifier it encounters with its declaration (Not Definition). A declaration CAN BE a definition if one exists. For example, when you include a function in its entirety above function main() that definition acts as both a declaration AND a definition. But for the sake of a successful compile stage, all that is required is that the compiler is able to match each identifier used with its corresponding declaration. As I mentioned before, declarations (and definition) are only valid for the scope of a single source file - since that is the compiled unit. Which leads us to the next point....

What is the point of header files?

Because symbols are only valid within the scope of a source file, any symbols which we need to use in multiple source files must be found in EACH source file. Rather than duplicating the function, variable, or data type declaration at the top of each source file, we put them in include files. This does two things for us....first, it means less typing, which is good. Second, it means that our declaration will be IDENTICAL in every source file. This is essential during the link stage as there can be problems if the declaration doesn’t match the definition or if we've got multiply defined identifiers with different declarations.

The two key things to note about the above paragraph is that headers are for shared declarations, not definitions. And that the contents of these header files should be any symbols which we need, not any symbols which happen to be available. Its important to keep the size of header files as small as possible. The primary reason for this is that it keeps the compiled object files small. As well, the fewer things included in header files, the less likely we are to encounter common problems which occur with header files. Lets explore those now...

Common problems resulting from header file miss-use

There are two ways to incorrectly use header files. The first is what you put into the header files, and the second is how you include your header files. Lets look at both:

What you put in header files: People's first response when they learn about header files is to put everything in a header file so they have access to its contents everywhere. As we discussed above, header files are for declarations of identifiers you want available in multiple source files. Often times people forget this and will attempt to include "Definitions" in their header files. This can often lead to "Multiply defined" symbols or "Function re-definition." During the compile stage your compiler simply checks for the existence of a declaration, but in the link stage your linker attempts to match each instance of an identifier with its matching definition. Whereas the compile stage deals with each source file as a separate entity, the link stage combines the contents of all object files into a single symbol pool. If you've provided a definition within your header file, then multiple source files will have a matching definition - and the symbol pool will have the same symbol defined more than once. This will often confuse the linker as it doesn't know which definition to use.

How you include your header files: There are two common questions people have when including header files, or rather, two observations they seem to make.

Observation 1

First, people often have the misunderstanding that the order in which they include header files matters...it doesn’t, if they're used correctly. The reason order *appears* to matter, is because people often forget that the compiler never actually sees the header files. Header files are "Copied" into a source file by the preprocessor just before the source file is compiled. "So?" you ask...Well, this means that header files must obey all the same rules as source files. In particular, all symbols used within the header file must be declared before they're used. Often what occurs is people have declared a necessary symbol within a header file to be used within the source file. They forget, however that the symbol was declared in a header file, and then they attempt to use the symbol within another header file. The problem is, now the second header file MUST be included after the first header file, or the compiler will complain that the symbol is undefined...and it is, at least when THAT header is included. There are three solutions to this problem which many people use.

Solution A: The first solution is to just declare the symbol again at the top of the second header file. Although multiple definitions are not allowed, multiple declarations are just fine, so long as all the declarations match. So including a second declaration within the second header file successfully detaches it from the first header, meaning order no longer matters.

Solution B: The second solution people often try is to pull the symbol declaration out of BOTH header files, and instead put it in the source file ABOVE the #include of either header file. This does remove the dependency between header files, but you must be careful with this approach as now anywhere either of those header files are included, a declaration must be made in the source file, just above the #include.

Solution C: The third solution people often try is to pull the declaration out of either header file, and instead put it in a 3rd header file which is then included in both header files. This is what is sometimes referred to as a header chain, which is the subject of the next observation people make.

Observation 2

People often feel like they need to include multiple header files within each header file, just so they can get their code to compile. This is a bad sign and is an indication they've got header chains, or that they're not doing a very good job making sure their symbols are defined locally within the header. We'll get into this more in later chapters when we begin covering classes.

Ultimately, this observation is a sign that you've got too much 'going on' in your header files. Remember that your source files are where the action is supposed to go. Remember to put your definitions in your source files, and try and include as few header files as necessary within your header files. Your source files should be including header files, not your header files. Its better for your source files to include a large number of header files, than for your header files to include multiple header files. There's two primary reasons for this:

1. Performance: By including things into header files, you're causing all source files which include those header files to ALSO include anything that the header file included...phew, did you get that? This causes your source files to become unnecessarily large which can have a negative impact on performance.

2. Dependencies: A feature of most compilers is dependency checking. Whenever you build your application it uses timestamps to determine which source files need to be rebuilt. However, if your source file includes header file, it also checks THOSE timestamps as well. If a header file is more recent than the source file which includes it, the source file must be recompiled. And of course, this timestamp philosophy travels all the way up the dependency chain. So if you've got a header file 4 levels up in a header chain, then all headers beneath it become invalidated, which means any header or source file which includes one of THOSE header files also become invalidated...blah blah blah. As you can see, even with a small dependency chain it becomes possible to force a rebuild of your entire code base, just by making a relatively small change.

By keeping your header files small, and including them directly in source files, you guarantee the minimal rebuild for changes you make in your headers.

Well, I hope this has been informative and helpful. Please ask any specific questions you might have. If you need examples, don’t hesitate to ask.

Cheers!

[Edited by - jwalsh on June 14, 2006 5:45:54 PM]

Share this post


Link to post
Share on other sites
@Oluseyi: you are right. As a professor told me some years ago, the devil lies (always) in the details, and the one who forget to consider the details is a fool. In this case, I used an imprecise terminology in order to give a somewhat precise definition of two important words, and I forgot to explain the details. My bad. Apologies to everyone.

Quote:
Original post by adam23
I have one question about your post before Emmanuel. You have three files, main.ccp, file.h, and file.cpp.

I was wondering what the recommended way to include files is especially if you have a lot of files.
//Lets say I have a main.h
*** Source Snippet Removed ***

Then my main.cpp would look like this

*** Source Snippet Removed ***

Now lets take and say file.cpp
I usually do it this way so I have access to all other included files.
*** Source Snippet Removed ***

I've found that this prevents a lot of my preprocessor errors. The main thing I have found is that the order they are included matters, but I have never had any other problems doing it this way. I typed this in here, so there may be some errors, but I hope you understand my question. In file.h I wouldn't have any includes.

What I think is happening is that the declaration for file.h is being posted in main, then all of that is being posted in file because of the include in file.


You can verify your assumption using the "/P /EP" technique that I described earlier. But if by main you intended to write main.h then your are technically right: everything in main.h will be copied file.cpp, and since everything in file.h is 'copied' in main.h, then you'll have everything you need in the preprocessed file.cpp.

However, the question you ask is not an easy one. I have to quit the field of pure C++ programming to enter the large field of software engineering to give your the beginning of an answer.

In order to be efficient (as in "I type less thus I have fewer possible bugs"), you'd better try to limit your #include directives to what's really needed. Moreover, it is a good practive to always include file.h in file.cpp (if file.cpp implements what's declared in file.h) even if another file included in file.cpp already includes file.h. The main reason behind this is that the object oriented paradigm tells you that while the implementation of a class might change, tghe interface of this class is less subject to changes than its implementation. In C++, header files do more than describing an interface - they also describe the private part of a class, and that's part of the implementation. It means that whenever you change the implementation of a particlar class, you might also change the header file for this class.

On day D, you class A is using class B internally (for example, a private member of A is a B instance). Thus, A.h includes B.h. B.cpp instantiate a A object, so B.cpp is including A.h. You feel it's enough - since A.h includes B.h, you d'ont have to include B.h in B.cpp. Later, (day D+1) you change the implementation of A and you figure out that you don't need to have a member of type B. As a consequence, you remove the #include "B.h" directive from "A.h". Suddenly, B.cpp refuse to compile, despite the fact that you don't change anything in B.cpp.

Remember that you are using header guards in your headers. Those guards avoid multiple inclusions of a header file in a compilation unit (such multiple inclusion might result in multiple declaration of the same symbol, something that a compiler don't like very much). As a consequence, there is no problem to #include "B.h" twice in B.cpp. You can take advantage of this to avoid the error I just explained.

The correct usage of header file is hard, but fortunately, some rules might help:
  • The implementation file of a class should explicitely include the file that declare this class (A.cpp always includes A.h).

  • The less file you include, the better it is

  • The order of the #include directives should not depend on the #included files

  • Whenever you use a symbol that is declared in header1.h in another file (either another header file or a cpp file), #include "header1.h" in this file

The benefits of the firt rule is easy to understand: you don't have to search which file really includes the declaration of your class, and you avoid the error I described earlier.

The benefits of the second rule is less visible and has to do with the reduction of dependencies. A file should be included only if there is a direct dependency between the includer and the included file. Since more dependencies means more coupling means a decrease in code reuseability, avoiding dependencies will probably (not always) increase reusability - when it comes to software design, this is considered as a Good Thing.
Be aware that this rule doesn't mean "don't include any header file in your cpp". One big mistake in software design is to create what is called a god class - ie a class that do everything. Of course, it means that this class won't depend on any other class - but it also means that there is no abstraction in your project, and no abstraction leads to no possible reuse. God classes are not OO programming - they are procedural programming in disguise.

The third rule is a bit more complex to understand. The goal is also to ease both core reading and code writing. There is nothing more insane that having to include foo.h before bar.h whenever you want to declare the class "bar". If bar needs to know foo, then, please, include foo.h into bar.h.

The last rule is a temporary one - you'll see that you can bypass it when you'll learn about forward declarations. For the moment, consider it as an axiom because it is the only way to correctly follow the 3rd rule. The other benefit is that you can know what will be used in a file by reading the list of included files.

Let's implement these rules into a simple example: main.cpp defines main(); main() instantiate class A and class A contains a member of type B. We have 5 files: main.cpp, A.h, A.cpp, B.h, B.cpp.

Rule 1 means that A.cpp always includes A.h and B.cpp includes B.h.
Rule 2 tells you that since B.cpp don't need A.h, you don't have to #include A.h in B.cpp (rather obvious)
Rule 3 means that since a B object is declared in the interface of A, A.h needs to include B.h. If we fail to do this, we have to #include "A.h" whenever we want to use a B object - thus, the #include order matters.
Rule 4 means that since A.h is referencing class B, we need to include B.h in A.h (cool: rule 3 already told us to do so. This is not very important because the reason is very different).
Conclusion:
  • main.cpp includes A.h

  • B.cpp includes B.h

  • A.cpp includes A.h

  • A.h includes B.h


In your example, main.cpp includes main.h, main.h includes file.h and file.cpp includes file.h. If file.cpp needs to instantiate (or just need to know) a symbol that is declared in main.h then it should also include main.h.

There is something even less obvious in your example because it introduce something which is called circular dependencies (A depends on B which depends on A). Generally speaking, this is something to avoid. There are some cases where it is difficult to get rid of this circular dependencies but in most situations this is software design mistake.

I won't discuss more about circular dependencies now - experience told me that software design is something that you must learn later [smile]

I hope I'm clear (and precise enough [smile])

And I added those </li> tags! [smile]

Share this post


Link to post
Share on other sites
Thank you guys for taking the time to respond in so much detail to my question. I basically have two more questions, (as if I didn't ask enough huh :) ).

1. jwalsh you mentioned forward declarations for classes, but if you do it this way are you restricted to only using references of objects in that class?

2. I am working on a game engine that encapsulates everything, DirectInput, DirectSound, Direct3D, and so on. Right now I have seven header files and five cpp files. What I was trying to accomplish was one point of contact with the engine. Here is a basic example of what I have set up.

//==========================================================
//Engine.h
//Created by Adam Larson
//==========================================================

//-----------------------------------------------------------------------------
// DirectInput Version Define
//-----------------------------------------------------------------------------
#define DIRECTINPUT_VERSION 0x0800

//------------------------------------------
//System Includes
//------------------------------------------
#include <stdio.h>
#include <tchar.h>
#include <windowsx.h>

//-----------------------------------------------------------------------------
// DirectX Includes
//-----------------------------------------------------------------------------
#include <d3dx9.h>
#include <dinput.h>

//------------------------------------------
//Engine Includes
//------------------------------------------
#include "LinkedList.h"
#include "ResourceManager.h"
#include "Input.h"
#include "Geometry.h"
#include "Font.h"
#include "State.h"



See in this situation ResourceManager needs LinedList, Font needs Geometry.h. Am I doing this correctly by having all my includes in the header for the engine and then linking everything here. How do I get around worrying about the order? I could include LinkedList in ResourceManager and Geometry in Font, but I also need these classes for variables in Engine.h. For example I am creating a linked list of states as a private member of the Engine class.

Thanks again for all the help, we all really appreciate it. Includes have always been one thing that has confused me because I have been told so many different things. I do remember in college my professor telling me to always include the header in the cpp file that implements the header. Then this year in college my professor is telling me the opposite, by saying that everything is placed in the order you put them in, so you have it included in Engine.h, and have Engine.h included in the source.

Share this post


Link to post
Share on other sites
Quote:
Original post by Emmanuel Deloget
Remember that you are using header guards in your headers. Those guards avoid multiple inclusions of a header file in a compilation unit (such multiple inclusion might result in multiple declaration of the same symbol, something that a compiler don't like very much).

Not quite accurate. Compilers don't necessarily care about multiple declarations:
extern int a;
extern int a;

The above code compiles perfectly.

What compilers choke on is multiple competing declarations:
extern int a;
extern int a;

int a = 5; // ERROR: Redefinition!

The problem here lies in the fact that the compiler has been told that a variable named a with type int exists, but it is defined in another compilation unit with namespace-level visibility (we can assume this code to be in the global namespace, for simplicity). However, we then go ahead and introduce another variable with the same identifier and signature, which is a collision - a symbol redefinition and a competing declaration.

In the case of a function declaration, we don't have that problem:


#include <iostream>

int f();

int f();

int main()
{
extern int a;
extern int a;

std::cout << f() << std::endl;
return 0;
}

int f()
{
return 1;
}



The above compiles just fine, no errors.

The difference lies in the implication of the extern keyword and the fact that functions are not first-class objects in C++, but really a sort of meta-object.

(These, incidentally, are the reasons C++ constitutes a terrible beginner language. Explaining seemingly simple things quickly leads into byzantine explorations of the unintuitive. [smile])

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this