Sign in to follow this  
DeathCarrot

C++ include necessity checker?

Recommended Posts

I'm working on a 100+ file source tree in a personal project and I'm convinced I could seriously improve the compile time by clearing out unnecessary #includes. Is there a tool that could check whether anything is referenced from within a header, if the header has been included previously in the chain, and possibly whether it's just a pointer reference so you could just declare the class instead of including the header? I'm running Linux if that makes any difference.

Share this post


Link to post
Share on other sites
While makedepend does kind of do what I want, It's not quite what I'm looking for. Firstly, unless it has functionality not documented in the man-page, it's per-object, not per-file (would be find if I wanted just to clean .cpp files). Its output is also a bit tedious to work with (simply a list of tens or even hundreds of header files for each object file).

Just to clarify, I don't want a compile-time solution, I want to modify the source and header files themselves. For example, if a .cpp file had a function which used my console class for debugging purposes some time ago, but doesn't use it anymore, I'd like to get rid of the #include "console.h" from the top which I'd forgotten to take out. Or an example of one which is potentially more difficult to catch:
#include "b.h"
class A {
B *ref;
};

I'd like this to be optimised to
class B;
class A {
B *ref;
};

Share this post


Link to post
Share on other sites
This hypothetical tool would be very useful to every C++ developer, so if someone has made one I'm sure they're charging money for it ;)

(I'd definitely convince my boss to buy such a tool for our system if it exists; our compile times are currently horrid too! I recently spent an entire fortnight doing nothing but decoupling 4 of my projects from other other 108...)

Share this post


Link to post
Share on other sites
Interesting, I might make this over summer if I have any time to spare. Certainly doesn't sound like an insurmountable task.

Thanks for the replies guys.

Share this post


Link to post
Share on other sites
Quote:
Original post by DeathCarrot
Interesting, I might make this over summer if I have any time to spare. Certainly doesn't sound like an insurmountable task.


Nothing in C++ looks like insurmountable task.

But then, you start coding...

Share this post


Link to post
Share on other sites
Quote:
Original post by DeathCarrot
Interesting, I might make this over summer if I have any time to spare. Certainly doesn't sound like an insurmountable task.


I've been thinking about a decent include checker for a while, and it seems like a pretty complex task to me.

Starting from the basic capabilities, the first thing I'd want it to be able to do is compile my headers standalone. Of course no one needs a tool for this, you just need a compiler. Run all the headers through a compile or syntax-check pass to make sure they all work independently of each other. You don't want headers that magically depend on other things to be included, so if you find a dependency you have to either resolve it via code or add an #include inside the problematic header.

The feature you want is to be able to detect when you've included headers that aren't doing anything. The way I'd go about it is to make the tool dump symbols defined by each header and symbols required by each source file. This could get pretty complex, and you'd pretty much have to co-opt an existing C++ parser to do it in any sane amount of time with few bugs. Maybe OpenC++ would do the trick, I'm not sure. Anyway, once you have the lists of symbols provided by and symbols needed, it's trivial to determine if a header is pointless.

Suggesting refactors such as forward declaring various classes would be an extra step on top of that, but if you've got a full C++ parser and know how to work with it at that point, this step shouldn't be too hard. A worse way to do the same thing would be to write a tool that just warns any time it finds a class that depends on another class (and thus can't be forward declared) and you could go through and fix them all manually.

My ultimate holy grail of include tools is one step farther than this... it takes the list of symbols provided by and symbols required and uses a heuristic to generate a completely new set of headers. This would be for projects (like Quake) that use the ill-advised technique of using a single header file for everything. I started splitting them up when I was playing with it long ago, but it became a chore and I realized that I'd rather just have an automated tool to do it. This isn't really as much of a problem in C++, but a tool that makes suggestions like "moving symbol x into header y could eliminate Z inclusions" would be awesome.

I eval'd IncludeManager a while back and it's ok. It was pretty slow but it did provide a lot of helpful visual information about headers, including 2 way dependencies and things like "oh we're pulling in 2M of extra headers because we mistakenly included blah.h in blah.cpp." But it didn't live up to my dreams, so I didn't end up buying it. Still it's pretty cheap and might be worth your time, especially with a small project. (100 files is a small project... :) )

Share this post


Link to post
Share on other sites
Quote:
it's trivial to determine if a header is pointless.


Is it?

The problem comes from the fact that there's no constraint anywhere that a header file must be self-sufficient. You can compose a class across several files, and includes are nothing more than plain text files, which are not given any sense even in a generated binary.

They just may contain some instructions that compiler will convert into code - but it's not really a requirement, except for main().

And even a trivial example becomes non-trivial

// class_prolog.h
class CLASS_NAME {

// class_body.h

#ifdef LINUX
CLASS_NAME() {}
#elseif UNIX
CLASS_NAME(int x = 0) : xx(x) {}
#include "something.inl"
#endif

// class_epilogue.h
};


// Foo.h
#define CLASS_NAME Foo
#include "class_prolog.h"
#include "class_body.h"
#include "class_epilog.h"
#undef CLASS_NAME

// Bar.h
#define CLASS_NAME Bar
#include "class_prolog.h"
#include "class_body.h"
#include "class_epilog.h"
#undef CLASS_NAME




Now throw in some conditional compilation....

Share this post


Link to post
Share on other sites
The only full-proof way to do it is to actually write a C++ compiler. Right now, just think how many ways a symbol can be declared such that it is hard to parse.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Quote:
it's trivial to determine if a header is pointless.


Is it?


You're absolutely right, conditional compiles throw a very large wrench (I really wanted to say spanner) into the works.

Re-including headers is another problem but could be partially avoided by only doing analysis on headers with proper internal guards and warning on everything else. This is a really good reason to check or enforce that your headers are self-sufficient via a compile pass or other means.

My current project at work uses the redefined macro bit quite a bit, and I hate it. But it saves them work at the expense of clarity and sanity, and I can't suggest something pragmatic to replace it. But most codebases won't have this problem, so a tool with good white/black list functionality would still be very helpful.

Share this post


Link to post
Share on other sites
Quote:
Original post by DeathCarrot
Is there a tool that could check whether anything is referenced from within a header, if the header has been included previously in the chain, and possibly whether it's just a pointer reference so you could just declare the class instead of including the header?


I have never seen such a tool, commercial or free. I have seen many people asking for such a tool and have often felt the need for one myself. If you could create such a tool, the C++ programming world would beat a path to your door.

It's a non-trivial task, since it would require the semantic knowledge of a full C++ compiler combined with knowledge about the C preprocessor that feeds the compiler. These are normally two unrelated pieces of software.

If anyone wants to embark on such a project, I would suggest starting with the Doxygen parser as a base, since it already does a lot of similar analysis.

Share this post


Link to post
Share on other sites
Yes, I certainly seem to have spoken too soon. I guess for the majority of files you "just" need to run it through a preprocessor and parse for tokens of interest, noting down which files they came from in the process; but the aforementioned non-trivial cases would make it considerably more difficult.
I guess there's other things such as if you have
#ifdef WIN32
#include <windows.h>
#else
#include <X11/Xlib.h>
#endif
and you're on a *nix system, you probably won't have a windows.h to check through.
Also, related to that, what would you preprocess the files with (definition wise)?
#include <console.h>
...
#ifdef DEBUG
Console::cout << "Debug message: " << str;
#endif
would need to be preprocessed both with DEBUG defined and undefined to see whether the console is required. (Having the app add an #ifdef around the #include would be nice, of course =P)
But you certainly wouldn't want it to evaluate every possibility of:
#ifdef THIS_IS_NEVER_DEFINED
GIBBERISH!(*&%()@#
#endif

.. I guess those can be solved by having the user input a list of potential valid preprocessor definitions.

The non-self-sufficient headers are certainly the bigger issue, though. Ignoring them would be one way to go I suppose.

@Bregma - Preprocessing the input file isn't much of a hurdle as GCC already provides this functionality not only in executable form, but also a convenient library (libcpp). I think the major issues are more fundamental issues, such as what the application should do in non-trivial circumstances.

Share this post


Link to post
Share on other sites
Quote:
I guess for the majority of files you [b]"just"[b] need to run it through a preprocessor and parse for tokens of interest


A few libraries you should take a look at first: Windows API, C++ standard library, Boost.

The "just" is like nuclear fusion. Ready in 20 years. Since 1950's.

C++ files passed to compiler are plain text and nothing more. They have no structure whatsoever, so you can't assume anything.

Share this post


Link to post
Share on other sites
How is this non-trivial? Let the compiler do the hard part for you.

Foreach CPP file you own:
Foreach #include in that CPP:
{
Remove the #include and try to compile the project
If any build errors, put that header back in
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
How is this non-trivial? Let the compiler do the hard part for you.

Foreach CPP file you own:
Foreach #include in that CPP:
{
Remove the #include and try to compile the project
If any build errors, put that header back in
}

Careful there. It's possible that the header may change the meaning of already valid source code. I think you're on the right track, though. The one modification I'd suggest would be also checking that the preprocessor produces the same output (following the inclusion, of course). Oh, and in order for that to work, inclusions will need to be culled bottom-to-top.

Share this post


Link to post
Share on other sites
If inclusion of a header alters the meaning of previous definitions (via macros) in such a way that the resulting behavior differs, then the project is poorly constructed anyway.

The graphviz project is a good example, but I really hope that you guys don't write code like that.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
If inclusion of a header alters the meaning of previous definitions (via macros) in such a way that the resulting behavior differs, then the project is poorly constructed anyway.


Ah, but the point of this is to fix poor construction, no? That it's poorly constructed is pretty much a prerequisite for doing this.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ezbez
Quote:
Original post by Nypyren
If inclusion of a header alters the meaning of previous definitions (via macros) in such a way that the resulting behavior differs, then the project is poorly constructed anyway.


Ah, but the point of this is to fix poor construction, no? That it's poorly constructed is pretty much a prerequisite for doing this.


Simply including a header that you don't actually need (maybe because you removed all the code that originally needed it) is very different from including a header that has a "#define class struct" or something else dangerous but perfectly possible like that.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
If inclusion of a header alters the meaning of previous definitions (via macros) in such a way that the resulting behavior differs, then the project is poorly constructed anyway.
Who needs macros?
//a.h
namespace a
{
void foo();
}

// a.cpp
#include <iostream>
namespace a
{
void foo()
{
std::cout << "hello\n";
}
}

// b.h
namespace a
{
namespace b
{
void foo();
}
}

// b.cpp
#include <iostream>
namespace a
{
namespace b
{
void foo()
{
std::cout << "goodbye\n";
}
}
}

// c.h
namespace a
{
namespace b
{
void bar();
}
}

// c.cpp
#include "a.h"
#include "b.h"
namespace a
{
namespace b
{
void bar()
{
foo();
}
}
}

// d.h
namespace a
{
namespace b
{
void baz();
}
}

// d.cpp
#include "b.h"
#include "a.h"
namespace a
{
namespace b
{
void baz()
{
foo();
}
}
}

// main.cpp
#include "c.h"
#include "d.h"
int main()
{
a::b::bar();
a::b::baz();
}

Σnigma

Share this post


Link to post
Share on other sites
While it can happen with namespaces, it's more likely to occur with function overloads, especially overloaded operators and template functions.

Share this post


Link to post
Share on other sites
Overloaded new operators are probably the most dangerous issue with my idea of just removing headers until it stops compiling. Perhaps in addition to the "make sure it compiles" is to have a set of test cases to validate behavior as well.

Let's abandon the header-removal idea and go with precompiled headers or conglomerate compiling (including all CPP files in each directory in a single CPP file and then only compiling each conglomerate).

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
Let's abandon the header-removal idea and go with precompiled headers or conglomerate compiling (including all CPP files in each directory in a single CPP file and then only compiling each conglomerate).

But if the original purpose of header-removal was to reduce compile times, then both of those solutions can do more harm than good in a lot of situations...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this