Sign in to follow this  

Software-Documentation creates Clutter?

This topic is 400 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello forum!

 

I have a quite weird issue. Whenever I start commenting my own code, it feels like the comments just add clutter and bloat the overall project.

The general usage of documenation seems somewhat clear to me. In order to generate a doc, I need to document my code.

But that bloats my code instead of making it more readable.

Whenever I submitted work to other projects, I simply did not care about the clutter at all. It was not my project, I would not be forced to work on it all the time anyway. If I need to look up a thing or two, I would open the doc instead of read the comments within the source files.

 

When I want to understand what my functions do, I simply read their names and I derive their purpose.

I feel like my naming of classes and variables is just very clear about what concepts I try to get through.

 

Additionally, while this might work on a small scale project, it will probably fail on large ones - with many other developers and some that might chime in at any given time. They would be clueless with a project of tons of classes.

 

But whenever I see my documented code, I start to shiver because of how my documentation just created a storm of symbols.

Very simple example:

	//! Getter for (insert semantical here).
	/*!
	\return integer that represents (...).
	*/
	int get_value() 
	{
		return (this->value);
	}

The way C++ handles comments in combination with one of all possible style conventions make this very ugly on my eyes.

Especially //! and //*! and \return just turn in into a massacre of symbols for me.

 

Is this it? Is that all I can do about it? Isn't there a way for (e.g. Visual Studio) providing me a more readable approach?

 

Before I started my project, everything was feeling slick and clean but yet lacking documentation (which of course plays an important role).

But now, all these functions need their overhead of a symbol-massacre.

 

This would look so much easier on my eyes and way better structured. I know, parsing this would become a bit difficult, but key-words like "returns" could with that issue:

        // Getter for (...)
        // Returns int value representing (...)
	int get_value() 
	{
		return (this->value);
	}

Sorry if this all looks like a rant, it is not meant to be one!

I'm just wondering if I'm the only one who ever encountered this and if there are any solutions to it?

 

I hope I did not cause any confusion, otherwise just let me know!

Thanks for taking your time!

Share this post


Link to post
Share on other sites

Like others, I use Doxygen for documenting code. In my view, the documentation acts as an interface description. What does the function do, what goes in, and what goes out.
For the latter, meaning of values and boundaries of input and output are particularly important.
The documentation and the function header should be sufficient to call the function, and understand what you will get back.

I add the documentation to the .cpp file, as much as possible. It keeps the include files compact and readable, and the documentation is close to the code, which makes keeping it consistent simpler.
As an added bonus, the documentation header above the function acts as a visual clue that a new function starts.

For big enough functions, writing documentation is always useful. For small functions it is more questionable. However, that raises the complicated question of "when is it small?". To avoid that question, I simply write documentation for everything. The getter-like functions are very trivial to document.

 

To make sure errors don't creep into the documentation, I regularly build the documentation (as part of the script that updates my repository from upstream). I don't actually use the generated documentation much. I find it more convenient to open the file, and read it there.

 

Last but not least, there is a lot you can do with colours in your editor.

[attachment=33842:comment.png]

and a header fragment

[attachment=33843:comment1.png]

The dark-grey against the almost black background make the comment unobtrusive

 

 

 

Share this post


Link to post
Share on other sites

Don't comment every piece of code. Code should be readable.

Usually you don't comment getter/setter methods. (Unless they have side effects which is bad).

Naming conventions are more important than comments, if you give a good name to your method, you probably would not need to comment it.

From here it goes further, you have to build your methods and functions right in order to give them meaningful functionality and names.

For example a function called "Update"- What does it update? How does it do it? That's not a good name.

UpdateGunAmmunitation- This is a better name, it states what it updates.

 

Good comments explain something beyond the code. Algorithms, bugs, todos...

The only places I advise you to add comments even if they are cullter are interfaces and public APIs. (which most of the time are also interfaces).

Share this post


Link to post
Share on other sites

I do most of my work in C#, so I do use the triple-slash xmldoc comments from time to time.  Mostly I do this for my own purposes, because that stuff gets picked up by Visual Studio Intellisense, and can be helpful.  Note that this is essentially function-level comments.  I also use some ReSharper plugins that can parse these xmldoc comments to determine if a function could throw a particular type of exception, so I try to add those to help me out as well.

 

I have used Doxygen on our codebases before, but aside from the graphing capabilities, to get a sense of the structure of a projects, I haven't found it terribly useful.  I was originally hoping people might actually read the docs, but was sadly mistaken in the reality.

Share this post


Link to post
Share on other sites

Mostly I do this for my own purposes, because that stuff gets picked up by Visual Studio Intellisense, and can be helpful. Note that this is essentially function-level comments.
I also found this to be the most useful form. You get a compact description with all the little details that you must know at the time you need to know.

 

I was originally hoping people might actually read the docs, but was sadly mistaken in the reality.
People don't read generally if they can avoid it. They rather spend hours using a search engine and get nowhere.

 

Especially for technical documentation, you must understand its structure, and you need experience to recognize irrelevant details for the problem you are solving today. Once you have those skills though, documentation does give you a boost in productivity, especially in bigger projects, or longer running projects.

Share this post


Link to post
Share on other sites

Personally, I prefer that there would be good documentation, and the code is cluttered, vs no documentation and the node looks neat.

Because in my experience, programmers will often time name their functions, variables, and arguments some of the stupidest crap I have ever seen. Because if you can't identify what the hell the function does by it's name, then you need to look at it's documentation. If it's documentation is none existing because the programmer thinks that it's "Straight forward" your only other option is to actually DIG into the function's source files if you can't get ahold of the original programmer.

 

And if you come across crap like Dt, which does not mean delta time like any sensible person would think it means, and instead figure out that Dt is actually a reference to the rendering engine.... well it's time to print the code off on paper and go strangle the programmer with it.

 

There are also moments where a function seems like it's self documenting, but does something hidden in the side lines that is not very intuitive by face value.

 

Typically I get by-by doing this.

/* For functions or Variables that are self explanatory but have nothing hidden I tend to do this. */

bool loadDataTable(const w_char* FilePath, DataBase** OUT_DataBase); ///Function to load data. Returns false under failure.
bool initLevel(const LevelForm** Level_Data)                         ///Loads level from memory. Returns False under failure.

void setActorValue(const w_char* Name, int value);

/* for functions that requires a bit of explanation... */


/// Name: Set Skeletal Parameters
/// Note: This function is designed to set a series of bones on an
///       armature to some unique data. Arguments are best not filled
///       manually, but instead by the assistance of helper functions and
///       data structures.
/// @input param Data: A pointer to either an array of structs, or a linked list.
/// @input param count: A const integer that details how many bones are inside the linked list.
/// @input Skel_Descrip: A pointer to a description detailing all important information about a skeleton as a whole.

And yes, we do need a description about a skeleton in our current engine. It's because the skeleton it's self does hold more data than normal. Such as how an animated body's Bounding box needs to be calculated. If it should be calculated per limb, or as a point cloud of points that are defined in each bone. If the Skeleton is static (animations do not move the player), or dynamic (animations moves the player). etc.

Edited by Tangletail

Share this post


Link to post
Share on other sites

It's not the documentation, but the practice of how you write your documentation that creates clutter.  Like almost everything else, there's a good and bad practice of writing documentation and comments.

 

Could you blame the code for being a spaghetti code?  No.  Spaghetti code is the result of writing poor code.

 

Your confusion and clutter is the result of writing poor documentation.

Share this post


Link to post
Share on other sites

Could you blame the code for being a spaghetti code? No. Spaghetti code is the result of writing poor code.


I don't think anyone here has blamed documentation comments for spaghetti code. The arguments I have heard against excessive commenting are more along the lines of the comments making the code harder to follow as you read it - not quite the same thing. You could have the most straightforward code in the world and it could still be made near-unreadable by an inundation of comments. Especially if the code has changed, but the comments hasn't. Documentation going out of sync with its accompanying code is a thing that happens distressingly often. I've learned not to be particularly trusting of documentation when the underlying code is changing rapidly.

My word for code documentation you can't trust is "clutter."

Your confusion and clutter is the result of writing poor documentation.


What does good documentation look like to you? Edited by Oberon_Command

Share this post


Link to post
Share on other sites

I don't think anyone here has blamed documentation comments for spaghetti code. The arguments I have heard against excessive commenting are more along the lines of the comments making the code harder to follow as you read it - not quite the same thing. You could have the most straightforward code in the world and it could still be made near-unreadable by an inundation of comments. Especially if the code has changed, but the comments hasn't. Documentation going out of sync with its accompanying code is a thing that happens distressingly often. I've learned not to be particularly trusting of documentation when the underlying code is changing rapidly. My word for code documentation you can't trust is "clutter."

 

I was specifically talking about code, and code alone, excluding documentation.  Spaghetti code is the result of poor understanding of the language, the codebase, deadline, or management, or combination thereof.  It's not usually because of the language itself.

 

 

What does good documentation look like to you?

 

Where I can read the documentation and understand what the code does without reading the code at all.

Easy example: Do you need to read the entire OpenGL code to understand what it does?  Or, do you go to OpenGL documentation to know what it does?

 

The documentation should represent the code.  Yes, the argument against that is most people don't even bother updating the documentation resulting in outdated and misleading documentation, but that's the problem with the practice, not documentation itself.

Share this post


Link to post
Share on other sites
Especially if the code has changed, but the comments hasn't. Documentation going out of sync with its accompanying code is a thing that happens distressingly often. I've learned not to be particularly trusting of documentation when the underlying code is changing rapidly.

Like alnite, I use the idea that documentation should represent the code.

 

In case of discrepancies, the code wins of course.

However, I don't see it as a failure to update documentation, you have found a potential bug! (either in the documentation or in the code).

 

 

The documentation follows the decomposition of the code, and explains the same solution at a higher level of abstraction. As such, it creates a second, independent (non-executable) solution to your problem. Of course, with more than one thing that must stay in sync with each other, you'll get the consistency problem. However it serves as a detection mechanism here.

 

If your documentation says A, and code says B, then at some point in the past either code or documentation has changed but not the other (as you say, experience teaches it's most often that A is 'old' and 'B' is new).

Either way, this state says that something has not been finished. At least one of A and B must be changed, and since that was not done, perhaps code using this function were not modified either? It is safe to assume that at least at some point in time other code assumed A as functionality.

 

You can never detect this without a second, independent description.

Edited by Alberth

Share this post


Link to post
Share on other sites

Of course, with more than one thing that must stay in sync with each other, you'll get the consistency problem. However it serves as a detection mechanism here.


It also serves as a great mechanism to propagate bugs when hurried programmers on a deadline look at the out-of-date documentation to learn what your code does rather than the actual code itself. If you can't guarantee that the documentation will be in sync with the code, then the documentation is not trustworthy meaning that users of your code will have to look at both your code and your documentation to be sure. You'd might as well cut down on the amount of work they have to do and just make the code easy to understand.

Of course, one could argue that misuse of your code can also be propagated by bad naming and such in the absence of documentation, but I'd counter-argue that bad naming and bad documentation are kind of the same thing. A name is documentation. Redundant documentation adds work for both reader and writer in an environment where the documentation is not trustworthy.
 

Either way, this state says that something has not been finished. At least one of A and B must be changed, and since that was not done, perhaps code using this function were not modified either? It is safe to assume that at least at some point in time other code assumed A as functionality.

You can never detect this without a second, independent description.


Sure you can - provided your version control history is complete enough, you can look back to see when the code was modified to its current state and see what other code was changed alongside it (or immediately after it). Your version control is a "second, independent description" that directly tells you the history of the code, rather than having to guess whether it's the code or the documentation that's out of sync. Edited by Oberon_Command

Share this post


Link to post
Share on other sites
It also serves as a great mechanism to propagate bugs when hurried programmers on a deadline look at the out-of-date documentation to learn what your code does rather than the actual code itself.

In a hurry, you tend to not carefully read everything, so the room for error is much much bigger when you have to decipher edge cases from code.

 

Also, imho it is a myth that working less precise is faster. Unfortunately, it is counter-intuitive, which is why the myth stays alive :)

 

When I switched from working from undocumented code to documented code, I found I actually worked faster. The reason is that the description is at a higher level, ie less reading to do, and it mentions all the important items that you need to know (and leaves out all the irrelevant details).

 

 

 

You'd might as well cut down on the amount of work they have to do and just make the code easy to understand.

You should always do that either way. It greatly improves the chance that the reader understands correctly what you wrote.

 

What I think the problem with code is, is that you cannot separate the important bits easily from the non-important bits in code. Code is a whole, with loads of non-interesting parts and a few interesting points. By following common patterns we try (and often succeed) in making it clear what the code does, but there are no blue attention circles around the things you really should not miss, and these things can be as innocent as a simple literal number, or as small as a semi-colon, or a comma.. Documentation can just leave out everything without blue circle, cutting down heavily on the amount of things you have to parse and classify as relevant/not-relevant in a number of dimensions.

 

A second issue is that the "clear code" idea assumes that code tells the whole story. While technically this is true (only the code defines what is being computed), I would say that code is not always clear. For example, if it is correct that some case is NOT dealt with at some point, you can't see that very easily, since there is no code to see. At best, you can see the gap, but it is hard to distinguish from a bug. Code also only tells you what the solution is, but not WHY the solution is a solution, or even a correct solution. I have written code that needs some pages dense math reasoning with switching between math and programming-oriented view-points, to understand why the code solves the problem. If I would simply give you the code without those pages of text, you'll have a lot of work to reverse engineer why it is correctly handling the problem.

You can write much easier / more clear code there by solving the problem in a more naive way, but that kills the scalability (which was the main performance issue).

 

The latter type of code doesn't exist very often, I agree, but it strongly depends on the field you work in. (Mine is usually not games.)

 

 

 

 

Sure you can - provided your version control history is complete enough, you can look back to see when the code was modified to its current state and see what other code was changed alongside it (or immediately after it).

What I meant was that you can read the current version only, and see something fishy is going on.

 

Let's assume we have all history.

While I agree you can look at the history in code, and in theory find wrong changes that way, I don't think anyone reads the entire history of a piece of code to check whether a bad change was made that you're affected by today (at least your very much in a hurry devs with a deadline won't). Even if you did read the history, there are very soon so many changes, that it is becomes impossible to track them all. If you have a proper review system in place, all changes have been studied extensively at least once already without finding a problem. That makes it extremely unlikely you'll find problems by quickly reading all changes. So while in theory you're correct, in practice I don't see how that would work.

 

After you found a problem or a weird spot, sure, you have a focus point. That immensely simplifies the problem, and sure enough, you can find the spot were the now known problem was introduced pretty often.

 

However, an inconsistency between documentation and code is much more an early warning sign, you haven't got a reproducable bug, nor do you have a precisely defined spot in the code being wrong.

 

 

 

tells you the history of the code, rather than having to guess whether it's the code or the documentation that's out of sync.

Since the documentation is also in the same version control system in that case, you don't need to guess either. Having a version control improves both code and documentation investigation.

Edited by Alberth

Share this post


Link to post
Share on other sites

I think the time I've lost to a lack of in-code documentation, versus the time I've lost due to bad or outdated in-code documentation, is about 10x higher. Coders almost always overestimate how self-explanatory their code is, and they forget that their motivation for having the code a certain way may not be obvious a week, a month, a year down the line.

 

If you find other people's comments cluttering, then I suggest changing the font colour to something less visible perhaps?

Share this post


Link to post
Share on other sites

Code also only tells you what the solution is, but not WHY the solution is a solution, or even a correct solution. I have written code that needs some pages dense math reasoning with switching between math and programming-oriented view-points, to understand why the code solves the problem. If I would simply give you the code without those pages of text, you'll have a lot of work to reverse engineer why it is correctly handling the problem.


Which is why in my first post in this thread I advocated minimal documentation that explains why you're doing something, not what you're doing. :)

To be clear, I'm fine with documentation that explains why something is done a particular way, or that calls out side effects that wouldn't be obvious from looking at the code's interface. I just don't like reading code that has more comments than actual code. In my experience, code written like this is harder to follow and the comments add very little value.

Share this post


Link to post
Share on other sites

I have to chime in: I really enjoy this discussion!

 

By the way, what I meant by clutter is that the commented code (which generates the doc) is useless for me during development.

It just adds things I won't read next to the code that I will edit.

 

If I want to read the code, I open the code. If I want to read the documentation, I open the especially for human eyes designed documentation.

 

Even if I would pick up a project after months later, why would the document generating code be next to the programming code?

Why merge two things in one file that strictly have different intentions? As I said, someone who wants to understand the code on a higher level, just shall open the documentation generated file.

Are there many of you that actually read the documenting phrases within the actual code instead of a generated documentation?

 

Is there a way that I can create a single file just for the doc generating code? Like, keeping program parted from the doc generating code?

The documenting code would then include the referring files once it is getting compiled.

 

Maybe this idea is really flawed and should be avoided at any price, but I was wondering if there are developers doing that?

I can see how adding another file will make the step of keeping the doc up-to-date even more annoying for some. But that seems to be more of an issue with the developers discipline.

Share this post


Link to post
Share on other sites

I just don't like reading code that has more comments than actual code.

Hmm, better not start writing documented Python code then, most functions have less code lines than doc string lines there :)

 

 

I agree with you however, a line of comment here and there to mark big steps in the code is fine. If you need to explain every line, something is terribly wrong.

 

 

By the way, what I meant by clutter is that the commented code (which generates the doc) is useless for me during development.

Yep, you'd really hope that would be the case. An author of documentation has all knowledge in his head at the time he writes the documentation. If that is not the case, he would be unable to write the code. While I agree it looks silly to write documentation at that time, the fact is, it's the one moment where you can write it with the least amount of effort. Earlier is not possible, as you don't have enough understanding or not enough detail. Later is less efficient, since details get lost very quickly.

 

 

Note that the usefulness of documentation increases non-linear with the size of the development team, the code size, and the length of the development period. This makes that schools teaching you to write documentation is not working. A program that you can write in an hour or two, and then never use again, is useless to document. Sure they can grade you for it, and sure, you can write the text because they ask, but you never experience its benefits, and thus all students consider it useless wasted time (which is correct for the assignment) in general (which is an incorrect generalization, but understandable, as they haven't seen anything else but small programs).

 

 

If I want to read the code, I open the code. If I want to read the documentation, I open the especially for human eyes designed documentation.

That's a perfectly valid way of using documentation, but not everybody works in the same way. This is not different from everybody having his/her favorite OS, editor, window manager (if you have a choice), and MP3 player. Other ways are not better or worse than yours, they are just different. Everybody finds his/her sweet spot what works best for him/her.

 

 

 

Even if I would pick up a project after months later, why would the document generating code be next to the programming code?

I have that so I can update the code and the documentation at the same time. Experience shows that the further you place these things apart, the more likely they will get out of sync.

 

Even something as silly as a checker that warns me when I forget to document a parameter gets triggered makes a lot of difference. Humans are not that good in being very precise and consistent for extended periods of time. Maybe we should program a computer to do it :P

 

Why merge two things in one file that strictly have different intentions? As I said, someone who wants to understand the code on a higher level, just shall open the documentation generated file.

This assumes that you understand the higher level documentation. But how do you know? Quite often documentation uses special phrases with a specific meaning. If you want to verify your understanding of its meaning with the author intended meaning, you need the code as the common reference. You have to step down a level, to compare your ideas with what it really does. Once you're happy that you linked the right concepts to the right names, you can step up again, and use the documentation at the higher level.

 

 

You can argue that the documentation is not good enough, and to some extent that is true. However, I don't think it's entirely avoidable. An author generally doesn't know the background of the readers. It even happens to me in normal English language. My native tongue is not English, so every now and then I run into a word or phrase that I don't know. I have to step down to the common reference of a dictionary, to find the actual meaning of the word or phrase.

 

 

Are there many of you that actually read the documenting phrases within the actual code instead of a generated documentation?

I don't think anyone has a global overview of how many readers here do that, nor what you consider to be "many" :P

 

Why is that a concern to you? I am normally fascinated when I notice someone makes a different choice than I would. Why is that? What's the advantage? Why did I not do that?

 

I do it, but not always. It somewhat depends on what I am looking for exactly. If I know what source file to open, that is quicker than opening the documentation. If I am not sure where to search, the documentation is quicker as I can easily jump around until I find what I need.

 

 

Is there a way that I can create a single file just for the doc generating code? Like, keeping program parted from the doc generating code?

Never seen this, and I think it would be quite annoying. I do know of an approach where they do it the other way around, namely literate programming.

 

https://en.wikipedia.org/wiki/Literate_programming

The idea is that you write an explanation of your program, with code fragments in it. A generator extracts the text, and makes it a readable document, another generator shuffles the code fragments around to a working program. While I like the idea, using it in practice proved to be a challenge.

 

 

But that seems to be more of an issue with the developers discipline.

There is an issue with developers discipline imho, namely it needs to be too high. It is terribly easy to make an error in the many decisions you make, handling conflicting requirements, and the zillion options in a lot of dimensions that you explore to find that one solution with all the right properties. Then there are all the tools, processing steps, and procedures around development. This is why we have this terrible high rate of failures (one error every 100 lines or so). You didn't sleep long enough, and you make a typo or a copy/paste error without noticing it.

 

 

So if anything, you want to lower the need for discipline rather than increase it.

Share this post


Link to post
Share on other sites
Even if I would pick up a project after months later, why would the document generating code be next to the programming code? Why merge two things in one file that strictly have different intentions? As I said, someone who wants to understand the code on a higher level, just shall open the documentation generated file. Are there many of you that actually read the documenting phrases within the actual code instead of a generated documentation?

 

I prefer to read the documentation in the code.  There are higher level documentation like you mentioned, but there is also the version that describes what a function or a class does (which I think is the topic of this thread).  The higher level documentation with graphics and tables and possibly TOC belong to a separate place, and perhaps may even be best written by someone else.  However, the documentation (and comments) that's in the code is still useful, and I do prefer to have them in the code.

 

Here's an example from an actual project of how I would structure my code and why comment is useful.

[source]

def work(msg):
    # Validate that message is coming from the right source.

    ...bunch of code

    # Read body and validate parameters

    ...bunch of code

    # Make the tmp folder if it doesn't exist

    ...bunch of code

    # Download input files to temp location

    ...bunch of code

    # At this point, we have all the necessary parameters and they are valid, do the work

    ...bunch of code

[/source]

 

Here is a single generic 'work' function (why it's called 'work' is another story, but it's for compatibility reason to existing codebase).

 

Do you see how I organize my code into paragraphs, and each paragraph would have a brief description of what that paragraph does.  The 'bunch of code' parts consist of around 5-10 lines of code.  With these paragraphs and the comment that explains them, I know exactly where to jump into to fix a bug or improve the function.  Got a problem with tmp folder creation on some OSes?  There it is on the 3rd paragraph.  What if the actual work itself is buggy? Oh, the last paragraph.

 

If I were to remove the comments because "comments and documentation sucks", this function would be a single blob of 50+ lines of code that does many things who-knows-what-where.  If there was a bug here, I would have to trace through and follow the logic from the top of the function and all the way down to find where that bug is.

 

No, don't comment every single line of code because that's stupid.  Documenting a function with obvious names and a line of code like GetName() { return name; } is also stupid.  But that's not the documentation's fault, but your own fault for cluttering your own code with useless info.

Edited by alnite

Share this post


Link to post
Share on other sites

This topic is 400 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this