Code Organization with Functions and Headers

Started by
8 comments, last by Dhaos 15 years, 12 months ago
1)Does using a .h file cause any type of overhead if it is only #include'd once in the entire application? 2)Does using a function that will only be used once, within another function, each cycle cause overhead? ---------------------------------------------------------------------------------- Issue 1) Right now I only use 1 .cpp that holds the main render function and several .h files that store all other functions. I do this solely for organizational purposes because I do not enjoy trying to scroll through thousands of lines of code to find a function I need to edit/add. I tend to store only a handful of related functions within in .h file. Does this cause any type of runtime overhead? Example how my current application is setup: 0_Global_Data.h (contains #includes for 0a_Defines, 0b_Functions, etc.) 0a_Defines.h 0b_Functions.h 0c_Variables.h 0d_Structures.h 1_D3D.h 2_Calculations.h 3_Input.h (contains #includes for 3a_Keys & 3b_Buttons) 3a_Keys.h 3b_Buttons.h 4_Text_System.h Main.cpp (contains #includes for 0_Global_Data, 1_D3D, 2_Calculations, etc) I am not asking if this is a good/accepted way to organize. I am simply wondering if there is any excess overhead from using numerous .h files instead of numerous .cpp files. Sometimes there are bits of information I miss while reading up on the subject. If I remember correctly, and I may be very wrong, the point of multiple .cpp is that you don't have to recompile the entire application every time you update it. Just whichever .cpp you changed. ---------------------------------------------------------------------------------- Issue 2) If if have a rather long function, and say I split several pieces of it into functions that are only used once within that function, does this create excessive overhead? To my understanding compilers typically create functions to be inline (or rather when comipiled each call of said function is fully written out by the compiler to prevent time lost from jumping around during final execution) Example: void Some_Sub_Function_1(int &local_to_fiddle){...} void Some_Sub_Function_2(int &local_to_fiddle){...} void Some_Sub_Function_3(int &local_to_fiddle){...} void Some_Function() { int local_a; int local_b; int local_c; void Some_Sub_Function_1(local_a); void Some_Sub_Function_2(local_b); void Some_Sub_Function_3(local_c); } Most functions within my application (as well as variables) are as clearly named as possible so that little to no comments are used. I prefer to have most functions and variables be their "own" comments. When I gaze through my application's source, I wish each element to be very clear in what it does. As much as I love organization, I still understand that you must not sacrifice program speed/efficiency for "pretty-looking-organization". ---------------------------------------------------------------------------------- I gazed at the bits of source code available for various games (civ4, another old civ-type game, warlords 3 battlecry, battle for wesnorath, etc) and I find most coder's methods of organization to be piss-poor. Poor word choices for variables and functions, messy spacing, confusing file names, etc. I hate it when I waste time trying to find something namely because I have to scroll about excessively or view other unrelated functions that are 99% finished and no longer need editing. I am well aware that visual express edition has some UI settings to reduce some of this aggravation (the [+][-] buttons etc) but many of these actions are glitchy at best, or so they seem to be in my experience. Though the class view tab has aided me a a lot. ---------------------------------------------------------------------------------- Again to be clear... 1)Does using a .h file cause any type of overhead if it is only #include'd once in the entire application? 2)Does using a function that will only be used once, within another function, each cycle cause overhead? ---------------------------------------------------------------------------------- I am willing to listen to whatever the lot of you have to say, it's just that I feel organization within an application promotes stability and efficiency.
Advertisement
  1. Learn the language. Your knowledge of the C++ compilation process is clearly poor, which is why you think that division into multiple .cpp files is merely organizational. Look up "translation unit," and its relationship to object files.

  2. Microoptimizations typically don't help. They're particularly unhelpful when the developer does not have a thorough understanding of the processes involved. Trying to organize your header files to minimize "overhead" (what kind of overhead? runtime? compile-time?) is an unprofitable dead-end. If your compile times are that bad, use pre-compiled headers, inclusion guards and #pragma once as appropriate and be done with it.

  3. Learn to use your IDE. Scrolling through a text file to locate a function is inefficient, considering that any decent IDE lets you navigate your codebase as a collection of types and objects, which you can expand to locate nested properties - ie, in a class browser.
Quote:
As much as I love organization, I still understand that you must not sacrifice program speed/efficiency for "pretty-looking-organization".

Wrong. In general the axiom is the other way. Never sacrifice the readability or maintainability of your code until you have identified a clear bottleneck via profiling whose optimization will provide significant gains. In practice, even with games, only a very small percentage of the code needs to be hand-cranked for speed.

Quote:
I am willing to listen to whatever the lot of you have to say, it's just that I feel organization within an application promotes stability and efficiency.

Sure. But your organizational structure as you've presented it is poor, probably as Oluseyi observes, from a lack of understanding of the language and the process by which that language is turned into an executable.

In brief:
A compiler does not understand a header file. It is nothing special. Just some text that is inserted into another file by the preprocessor. The compiler only understands source (.cpp) files -- and only one at a time. Your build environment, such as VS or another IDE or a makefile -- is configured to feed files ending in known extensions for C++ source (.cpp for example) to the compiler, usually mostly automatically.

The compiler feeds each file, one at a time, to the preprocessor. The preprocessor resolves all the #directives such as #include, which is a simple, dumb, textual substitution of the named file into the working file. The result is known as a translation unit -- essentially, a single source file after the execution of the preprocessor.

Each file is compiled, in turn, in isolation, creating an object file. Then all the object files are fed to the linker, which connects them into an actual executable file.

You have, in effect, a single long source file. This is, by conventional wisdom, extremely poor organizationally. To cover your direct concerns first, a header file causes compile-time overhead when it is included. Every time it is included. The file may (or may not) need to be reopened and reloaded from disk, and certainly the translation unit will need to be completely recompiled. It is impossible to say in general anything about runtime overhead in this situation -- it depends on the code and very likely has nothing to do with the header file.

(And as for your second question, calling the function may or may not induce overhead for every call, depending on if the compiler inlined the function or not; it is impossible to say in general and has nothing to do with header files. It's also a worthless micro-optimization and you can't say anything about the results without looking at the assembly. In short, favor readability and maintainability over speed until a bottleneck appears).

Now then. Your method is inefficient because it essentially is one big translation unit. When that translation unit changes, the entire unit must be recompiled; as such you will always be recompiling your entire project every time. A more conventional organization that splits code in many .cpp files will compile much, much faster. Your link operation will be faster, but probably not fast enough to alleviate the massive compile time growth you will experience as your project grows in scope. Better to learn the correct way sooner.

It sounds like most of the stuff you have in header files can be trivially moved to their own source files, with a header describing the interface. You need to keep in mind that once you move to multiple source files you need move all definitions out of your headers, otherwise they will be defined in multiple translation units and you will get a linker error. Read this.
@ Oluseyi:
In response to #1
Yes, my understanding of the compilation process is indeed poor. This is roughly all I know: You type code in a file, hit compile, provided there are no errors, a file is created that the computer can "understand" and "execute."

You asked me to look up translation unit, so I did, I had never heard that word before. Here is what I found:

translation unit - a source file presented to a compiler with an object file
produced as a result.

object file - in C or C++, typically the output of a compiler. An object
file consists of machine language plus an external name list that is resolved
by a linker.

I already knew that relationship, just not the proper names. I am unsure of why this was brought up. Perhaps I made little sense in my original statement.

In response to "...which is why you think that division into multiple .cpp files is merely organizational" I re-googled "purpose of multiple cpp files" The links I checked confirmed my original assumption. The compiler only compiles one file at a time, if everything is one file, this process is long and inefficient. If code is split into multiple files you only need to re-compile files that were changed.
----------------------------------------------------------------------------------

In response to #2

I am unsure where you got the idea that I am trying to minimize overhead by using header files. I am using them for organizational purposes only. Let me try to rephrase my previous question: does using multiple header files cause overhead/slowdowns/complications/bad-stuff of any sort as opposed to including everything in a single file?

Current my application takes less than 3 or 4 seconds to compile. Though it is only a few thousand lines long. So I have no run into any problems yet. I am merely trying to prevent future problems and expand my understanding of c/c++ and programming in general.
----------------------------------------------------------------------------------

In response to #3

I already noted I use the class/object browser. To your merit of suggesting I play with it more, I noticed you can create new browser folders and store whatever you want in there, objects, classes, variables etc, so I can zip right to them in an organized fashion. I tried it several times before but didn't realize you had to right click above everything to create said folder.
----------------------------------------------------------------------------------

You won #3, but your comments #1 & #2 make little sense to me. And my questions were still unanswered. Perhaps I worded them poorly, I am unsure. Or perhaps I just have a general poor understanding on the subject. Enlighten me further if you will. I do apologize if I am sounding rude arrogant or stupid. I typically sound that way when I am trying to learn something new. However I am grateful that someone is taking time to explain the deep secrets of coding to me.
Quote:Original post by Dhaos
1)Does using a .h file cause any type of overhead if it is only #include'd once in the entire application?


It might cause slightly slower compile times.

Quote:2)Does using a function that will only be used once, within another function, each cycle cause overhead?


It might cause slight slowdowns at runtime.

However, if you wish to design a fast and performant application that compiles quickly, you should not care about these factors.
@toohrvyk: I'm not exactly worried about compile times, however you guys saved me some time by explaining that headers get recompiled every time. My main concern was that inserting some of these "purely-organizational" functions would result in nasty bottle necks down the road. From what I'm gathering from the other posters, is this will rarely be the case unless I were to do hundreds or thousands of these 'lil suckers every cycle. I suppose I still do not grasp how much "cpu/processing speed" is used to call a 10-20line function. I know it's not much, but I figured they could add up down the road.

@jpetrie: Ok so it seems I really did mis-understand the translation-unit concept. My apologies to Oluseyi.

I wish to understand more about this linking process.
Does the .exe compiled from say visual c++ express, contain all the "compiled-code" if you will required to run it? If I sent it to person x, would that individual be required to have all the object files as well? Or is the .exe self sufficient external media aside? (graphics, sound, model files, etc)

----------------------------------------------------------------------------------
You mentioned:
"(And as for your second question, calling the function may or may not induce overhead for every call, depending on if the compiler inlined the function or not; it is impossible to say in general and has nothing to do with header files. It's also a worthless micro-optimization and you can't say anything about the results without looking at the assembly. In short, favor readability and maintainability over speed until a bottleneck appears)."

The bolded segment confuses me a bit. Are you saying my idea of replacing sections of a long function with one-time called sub-functions is a worthless organization concept or that the "inline" keyword does little to reduce overhead? Or did I yet again miss the point completely?

*As for that link you sent, I was reading that before I made this reply, it answers most of the questions I had.
Quote:Original post by Dhaos
@jpetrie: Ok so it seems I really did mis-understand the translation-unit concept. My apologies to Oluseyi.

I wish to understand more about this linking process.
Does the .exe compiled from say visual c++ express, contain all the "compiled-code" if you will required to run it? If I sent it to person x, would that individual be required to have all the object files as well? Or is the .exe self sufficient external media aside? (graphics, sound, model files, etc)



Usually, though sometimes they might need a dll or something. I think anything done in command prompt should be ok (make sure you compile in release mode though!).

Quote:

The bolded segment confuses me a bit. Are you saying my idea of replacing sections of a long function with one-time called sub-functions is a worthless organization concept or that the "inline" keyword does little to reduce overhead? Or did I yet again miss the point completely?

*As for that link you sent, I was reading that before I made this reply, it answers most of the questions I had.


I believe he was just saying that since the call is a once only, don't sweat over it.
Quote:Original post by Dhaos
In response to "...which is why you think that division into multiple .cpp files is merely organizational" I re-googled "purpose of multiple cpp files" The links I checked confirmed my original assumption. The compiler only compiles one file at a time, if everything is one file, this process is long and inefficient. If code is split into multiple files you only need to re-compile files that were changed.

A translation unit is comprised of an implementation file (.cpp) and all the header files it includes, directly or indirectly. The preprocessor performs a full-text substitution of the contents of the header file every time it encounters an #include directive. If a header file includes another header file, the same thing is done there. This is the origin of inclusion guards - to prevent multiple substitutions of the same text due to complex inclusion dependency graphs.

A solid understanding of the relationships between header, implementation and object files makes it clear that header and implementation files should be used in semantically meaningful ways - ie, they should reflect the organizational structure of your code. Attempting to bias them to improve compile-time performance is a microoptimization, and likely to fail without extensive knowledge of the precise behavior of the given compiler.

Quote:I am unsure where you got the idea that I am trying to minimize overhead by using header files. I am using them for organizational purposes only.

Then why do you keep asking about "overhead"? To whit:
Quote:Let me try to rephrase my previous question: does using multiple header files cause overhead/slowdowns/complications/bad-stuff of any sort as opposed to including everything in a single file?

Does it matter, if you're using them for organizational purposes only? Plus, you still haven't defined "overhead."

A long, long time ago, when computers were slower and compilers not nearly as sophisticated - they didn't optimize aggressively, for instance - a simple change like manually unrolling your loop could speed up execution time, and using fewer header files could speed compilation time immensely.

Those dark days are behind us.

If you encounter a resource speaking nebulously of "overhead," chuck it. It's likely rooted in early 90s thinking, and while it may have some nuggets of useful information here and there, it's largely junk.

The best way to minimize compilation times is not to compile anything. If you can insulate changes from each other, then your compiler only needs to compile changed translation units and re-link. Pre-compiled headers take this a step further and concatenate generated object code into a single large archive, so that fewer object files need to be linked. Don't employ this and other tactics without determining that there actually is a problem first, though.
Quote:Original post by Dhaos
You mentioned:
"(And as for your second question, calling the function may or may not induce overhead for every call, depending on if the compiler inlined the function or not; it is impossible to say in general and has nothing to do with header files. It's also a worthless micro-optimization and you can't say anything about the results without looking at the assembly. In short, favor readability and maintainability over speed until a bottleneck appears)."

The bolded segment confuses me a bit. Are you saying my idea of replacing sections of a long function with one-time called sub-functions is a worthless organization concept or that the "inline" keyword does little to reduce overhead? Or did I yet again miss the point completely?

Replacing sections of a large function with smaller functions is worthwhile refactoring, but not because of performance. Do it for readability and maintanability reasons, as well as the fact that the smaller functions may prove reusable in the future.

When it comes to precisely what the compiler does during the build process, many things are non-deterministic. It is generally not worth the effort to find out whether a function will always be inlined; compilers will tune their output to the specified optimization level and any architecture hints that are supplied, and will employ heuristics and other information to build a best-case binary. It is possible to optimize beyond the compiler, but precious few people possess the necessary skill and knowledge, and the gains made from that significant effort investment are minimal - not worth it, on a cost:benefit basis.
My use of the word overhead is simply to mean excess baggage, reduced application run-speed, excess stuff that exists only to bog or slow a program down while running not during compilation.

The general consensus I'm getting here is that if a coder finds a way to make his/her code more readable/debuggable/organized, do it. That the processing power used for one's added functions is almost negligible.

Thank you all for your help. Things are bit clearer now.

This topic is closed to new replies.

Advertisement