Need advice on commenting (doxygen)

Started by
12 comments, last by RobTheBloke 10 years, 10 months ago

Hello,

since I've been given documentation a free pass until now, I just started to using doxygen to properly document my engine. However, since I've been really really lazy about documentation before, I'm kind of unsure, especially about the amount of documentation/comments necessary. Since I've read a few articles/post specially on stackoverflow, I got a little worried that my documentation might be a bit over the top. Consider this example:


        /// Handles registration to and dispatching of messages and queries.
		/** The MessageManager is responsible for allowing systems to
        *   subscribe to messages and queries, as well as recieve any 
        *   such unit and dispatch it to the previously subscribed systems. */
		class MessageManager
		{
            typedef std::vector<const BaseSystem*> QueryVector; ///< Typedef for vector of query registrations
			typedef std::vector<std::vector<BaseSystem*>> SubscriptVector; ///< Typedef for vector of message subscripts

		public:
			MessageManager(void);
            
            /** Subscribe system to message.
             *  This method allows systems to subscribe themselfs to the templated message 
             *  type. This is necessary for them to recieve any message via the manager.
             *  
             *  @param[in] system Reference to the system that wants to subscribe to the message.
             *
             *  @tparam M The type of the message that should be subscribed to.
             */
			template<typename M>
			void Subscribe(BaseSystem& system);

            /** Register system to query.
             *  This method allows systems to register themselfs to the templated query 
             *  type. This is necessary for them  to recieve any query via the manager. 
             *  Only one system can be registered to a query at a time. Latter registrations 
             *  will override earlier ones.
             *  
             *  @param[in] system Reference to the system that wants to register to the query.
             *
             *  @tparam M The type of the query that should be registered to.
             */
            template<typename Q>
            void RegisterQuery(const BaseSystem& system);

            /** Delivers a message to all subscribed systems.
             *  This method constructs a message and delivers it to all previously subscribed 
             *  systems.
             *  
             *  @param[in] args Constructor arguments for the message to being created.
             *
             *  @tparam M The type of message that should be delivered.
             */
			template<typename M, typename ... Args>
			void DeliverMessage(Args&& ... args) const;

            /** Performs a query.
             *  This method looks whether a system is registered to the query
             *  passed, and if so, it passes the query on for the system to handle.
             *  Returns false if no system is registered to the query. Otherwise,
             *  it returns the systems response to the query, which can indeed also
             *  be false, if the queries condition wasn't met by the system.
             *  
             *  @param[in,out] query Reference to the query that should be delivered
             *
             *  @tparam M The type of message that should be delivered.
             *
             *  @return Indicates whether the queries condition was fullfilled or not.
             */
            template<typename Q>
            bool Query(Q& query) const;

            /** Clears all registrations.
             *  This method removes all subscriptions and registrations of any
             *  system to any message or query. */
            void Clear(void);

		private:

            QueryVector m_vQueries; ///< Vector for query registrations
			SubscriptVector m_vSubscript; /**< Vector for message subscriptions. */
		};

This is in fact the first time I've ever placed descriptive comments in a header file (yeah, I'm lazy as ass, especially concerning documentation, I know). Without seeing the function definition itself, and without talking about whether my terminology used is correct, what do you say about the amount of documentation for each function? Is it too much, too less, or just about enough? Do you have any other hints what I should do/not do when documenting my code (specially when using doxygen)?

Advertisement

One facet of writing is knowing your audience.

Why are you writing the documentation? Who is the consumer?

If you are writing it for yourself, where you are the only consumer, then you will need to include whatever it takes to refresh your memory.

If you are writing it for the masses, where many people in the world are the consumers, you need enough detail to satisfy their needs.

Who are you writing the documentation for? Why? Let that guide you.

I think your comments are at about the right level of detail.

I like to describe the general functionality of the method in much the same way as you have. It is important to explain the basic behavior, and also to document any expectations regarding the input which are not self-evident (i.e. things which the function signature/parameter naming does not make obvious). I will also describe what kind of output can be expected.

Usually I don't bother formally documenting each parameter and return value, although this is more due to laziness than intent.

You are misspelling "receive" and "themselves". Bad spelling is extremely frustrating if you are trying to search for something in a large code base.

The documentation describes some parameter `M', where in the code it's called `Q'. This shows sloppy copy-paste from the comment for `Subscribe' into the other comments. This is a perfect illustration of why I don't like systematic comments that adhere to a rigid format: They are dry and boring, and they are hard to keep correct.

I would much rather see comments explaining the design decisions that lead to the code being the way it is. (Why does the MessageManager deal with both messages and queries? What's the difference between them?)

Better names will make most of those comments unnecessary. (If we have a member "RegisterQuery", why is the analog for messages not "SubscribeMessage"? Why is it that you "subscribe" to messages but "register" to queries? Can we just use "subscribe" for both? How about calling the template parameter that represents the type of query `QueryType' instead of `Q', and then we can remove the comment that [incorrectly] describes what `M' means? How about renaming `Clear' to `RemoveAllSubscriptions' and getting rid of the comment?)

A comment is an admission of defeat when trying to write clear code. Sometimes it's OK to do, but if there is a simple name change or code refactoring (like moving some piece of code into a function with a good name) that makes the code convey the same information without comments, I'd much rather do that.

I don't use Doxygen (but I should start), however I heavily comment my functions. My header files are the documentation, and they are fairly well documented.

Usually one or two lines of comment is sufficient for stand-alone functions, but longer comments (and some times example output) is needed for more complex functions, or member-functions that affect the class.

Here's some of my own comments:

//True if 'point' is within the rectangle.
bool Contains(const cPoint &point) const;
//True if 'rect' is entirely contained within (or equal to) this rectangle.
bool Contains(const cRect &other) const;
 
//Returns true if the two rects overlap each other (either by intersecting, or one containing the other).
bool Overlaps(const cRect &other);
 
//Returns the position relative to the rectangle's upper-left corner.
//For example, if 'point' is within the rectangle, the return value would be
//the position within the rect.
cPoint Relative(const cPoint &point) const;
 
//Returns the absolute position of a point that is relative to the rectangle's upper-left corner.
//This takes a relative point, and returns the absolute point. This is the inverse of the Relative().
cPoint Absolute(const cPoint &point) const;
 
//Getting 'point' relative to the rect's position, and taking rect's size into account,
//returns an integer as an index for accessing into 2D arrays. 
//Can be negative (underflowing), if 'point' is out of range of the rect to the top or left, and can go beyond
//(width * height) (overflowing) if 'point' is out of range to the bottom or to the right.
int Index(const cPoint &point) const;
//The same thing as 'index', but in reverse. Takes an index (from zero to (width * height)) and returns the point
//at that location. For example, if 'position' is (-2,-2), then an index of 1 would return (-1,-2).
//If a index that is out of bounds is given, then the point returned will likewise be out of bounds.
//For example, an index of "-1" would return "-3, -2" (assuming 'position' is (-2,-2).
cPoint FromIndex(int index) const;
//Replaces the _last_ occurance of 'toReplace' with 'replacement' in 'str'.
std::string ReplaceLast(const std::string &str, const std::string &toReplace, const std::string &replacement);

//Replaces _every_ occurance of 'toReplace' with 'replacement' in 'str'.
std::string ReplaceAll(const std::string &str, const std::string &toReplace, const std::string &replaceWith);
std::string ReplaceAll(std::string str, CharValidatorFunc toReplaceFunc, char replaceWith);

//Replaces _every_ occurance of 'toReplace' with 'replacement' in 'str'. Operates directly on the string, instead of returning a copy.
void ReplaceAll_Mutate(std::string &str, const std::string &toReplace, const std::string &replaceWith);

//////////////////////////////////////////////////////////////////////////////

//Returns a specified sub-string, and then erases it from 'string'.
std::string TakeAndErase(std::string &str, const CharRange &range);

/*
	Inserts 'toInsert' into 'str', every 'places' number of characters.
	If 'first' is not std::string::npos, then the first character is inserted at 'first'.
	
	Example:
	InsertEvery("test string", '|', 3)    -> "tes|t s|tri|ng";
	InsertEvery("test string", '|', 3, 0) -> "|tes|t s|tri|ng";
	InsertEvery("test string", '|', 3, 1) -> "t|est| st|rin|g";
*/
std::string InsertEvery(const std::string &str, const char toInsert, size_t places, size_t first = std::string::npos);
It's important to document what the function does, but not how it's implemented. This is because the implementation might change later, so code shouldn't depend upon the implementation or it'll break that code when it changes.

For our game code, we write function block comments approximately never.

Very rarely you will see a function block comment, but generally it is only as a cautionary tale and not to document regular usage.

For library code, some libraries are better documented than others, but much of the system is self documenting.

For example...

bool Inventory::CanAdd(ObjectGuid)

bool Inventory::TryToAdd(ObjectGuid)

void Inventory::ForceAdd(ObjectGuid)

ObjectGuid Inventory::FindFirst(TestFunction)

vector<ObjectGuid> Inventory::FindAll(TestFunction)

etc.

These functions are self documenting by their names and their single responsibility.

Who are you writing the documentation for? Why? Let that guide you.

Well I can say with a little certainty that I'm not writing the documentation for me alone, though at the moment only I'm using my engine. I'll probably be using it in group-projects in the higher semesters of my university, so the documentation should at least fullfill the needs of some other people using it. Maybe once its far enough I might actually want to make the engine public, then I'll want to also have at least an acceptable documentation, without spending another eternety re-documenting everything. So well, I quess my main goal is a documentation aimed at "the masses" (I wouldn't even need a documentation, really... I see why it makes things easier but I've been working with barely any comments till now - I know what you are saying, but I didn't have a problem catching back up even after months of pause - got quite of a memory, I quess). From that standpoint, I have little to no idea what a public documentation needs, and I'd be glad for any additional suggestion.

You are misspelling "receive" and "themselves". Bad spelling is extremely frustrating if you are trying to search for something in a large code base.

Damn, thats throught my whole code-base, does this even still count as misspelling? Oh well, thank god for auto-replacement. but I really don't want to know about all the other things I misspelled throughout my code :/

The documentation describes some parameter `M', where in the code it's called `Q'. This shows sloppy copy-paste from the comment for `Subscribe' into the other comments. This is a perfect illustration of why I don't like systematic comments that adhere to a rigid format: They are dry and boring, and they are hard to keep correct.

Uh, thats a good catch. Though I really have a huge urge to c&p at least the format of comments and even some semantics as here. I was not aming so much at maintaining a rigid format, than to kepp sort of a ordered system, since both methods are *nearly* identically, I just "had to" copy this, instead of writing it again with little different wording. But I think I'll get to that in a second...

I would much rather see comments explaining the design decisions that lead to the code being the way it is. (Why does the MessageManager deal with both messages and queries? What's the difference between them?)

I like that suggestion, definately going to add such details in the future. Right now I was more focused on getting the technical details correct - being my first experience with "professional" style of documentation, I have quite a hard time watching for everything :/

Better names will make most of those comments unnecessary. (If we have a member "RegisterQuery", why is the analog for messages not "SubscribeMessage"? Why is it that you "subscribe" to messages but "register" to queries? Can we just use "subscribe" for both? How about calling the template parameter that represents the type of query `QueryType' instead of `Q', and then we can remove the comment that [incorrectly] describes what `M' means? How about renaming `Clear' to `RemoveAllSubscriptions' and getting rid of the comment?)

Well there is a big difference between queries and messages, being that a message can have multiple systems subscribed to it, and a query only being registered with one system at a time. I tried to express this difference in naming the methods differently - seeing that this probably failed, do you have any better suggestion how to express this difference (maybe you know some word I can't recall right now that fits here perfectly?). But for now, thats the reason why both are being named differently, and because of this I can't scrap the comment.

That aside, for things that are self-explanatory, like if I renamed 'Clear' into 'RemoveAllSubscriptions', your suggestion is to get rid of the comment than and just have the function explain itself, right? Wouldn't at least a small comment, like the one I already have, still be appropriate? I'm just asking generally, because I've been reading through

some documentations before, which all consisted to huge parts of more or less self-explanatory names, but still at times I wished that there was some small hand-written comment, at least to testify that the function does nothing extraordinary. For example, the "RemoveAllSubscriptions" might actually notify the systems that their subscriptions get lost, which probably was noteworth, etc... which it all doesn't do... so there normally isn't the need for having at least a small "dummy" comment stating: That function does what it says, nothing extraordinary, is there?

It's important to document what the function does, but not how it's implemented. This is because the implementation might change later, so code shouldn't depend upon the implementation or it'll break that code when it changes.

Huh, thats another things that confuses me a litte. What counts as something the function "does", and what as "implementation"? I quess there is no general answer to that, but at least a little hint would help me a lot. For example, looking over my sample code, do you see any examples of things I have mentioned that might fall under the category "implementation" rather than what the function does?


It's important to document what the function does, but not how it's implemented. This is because the implementation might change later, so code shouldn't depend upon the implementation or it'll break that code when it changes.

Huh, thats another things that confuses me a litte. What counts as something the function "does", and what as "implementation"? I quess there is no general answer to that, but at least a little hint would help me a lot. For example, looking over my sample code, do you see any examples of things I have mentioned that might fall under the category "implementation" rather than what the function does?


I don't notice any in your comments above, and I rarely encounter situations where the distinction needs to be made, but they come up occasionally - and I'm only a single programmer working on my own small project.

To come up with a contrived example, suppose I have a function like this:
//Uses the current number of milliseconds since the application began to generate a unique ID.
//The ID is only valid for the execution of the program.
uint32_t GenerateID();
The part of the comment, "Uses the current number of milliseconds since the application began", is an implementation detail.
What if I later run into a bug where I realize the problem is GenerateID() being called more than once within a single millasecond, so it's returning the same ID twice in a row or more?
So I change it to not be based off of time but, say, an internal counter. For whatever reason, I decide to make it count backwards from <max-int> to 0.

Previous code may have read that it was based off of millaseconds, and so used the documented implementation details to decide, "Hey, time is consecutive", and so depend upon these ID to determine the creation order of objects. When the implementation changes to something else, the generated ID order is no longer consecutive, so the previous code still compiles but doesn't do what it is supposed to, and wierd logic bugs result that are hard to track down, especially in a large project, and especially-especially if Bob edited the GenerateID() function, and now Susan is encountering problems in her area of gameplay logic. She's going to look for the problem in what she edited most recently - not what Bob did to an innocently named GenerateID() in an entirely different library of the code base.

Anything documented must now be supported. If the implementation is documented, then the implementation, poor or not, buggy or not, becomes part of the features of that function, or you have to go refactor the other parts of the code that rely on the function.
A documented comment is a guarantee that that is how the function will continue to work henceforth.

Hi,

I think if you're going public with those interfaces then detailed documentation and clean design will be highly appreciated by your users. Just a couple of hints:

- you can use really good examples how to write well documented code. Personally I find Qt project documentation really well maintained, take a look here: http://qt-project.org/doc/qt-4.8/modules.html or take a look at boost documentation,

- try to avoid lies in your comments, it may sound obvious but bugs in documentation are really frustrating (I would even say it's top priority, it's better to leave some behavior undefined than write something in the code differently than in the comments or docs - this also applies to naming),

- when you write Doxygen docs, make sure it generates well - some people prefer to read the documentation without downloading whole package to see if it has what they need. I mean here just run doyxgen, and see how the generated document looks like, check the output for warnings, missing statements, etc. it will be easier to publish it later.

Thumbs up for taking the effort and documenting your API's,

hs.

Some good, lightweight principles to follow include:

Don't state the obvious -- This includes small things like not commenting every few lines with an English version of exactly what the code already plainly states, but also less common concepts, like how comments should focus less on describing the happy path (the function name should already be pretty clear about what its intended to accomplish), and more about how client code is expected to interact with the function.

Don't repeat yourself -- Again, simple things like not rephrasing code and function names as an English sentence, but also not describing a portion of functionally multiple times in multiple ways -- firstly because its a maintenance nightmare (invariably the comments will disagree, so which one's right), and secondly because duplication of documentation often indicates functionality that should be factored out and shared.

Keep documentation as simple as possible, but no simpler -- For interface descriptions, don't go into unnecessary detail. It paints a picture that things are more restrictive than they really might be. If doing X really will make the dragon appear, say so, but don't leave the user to believe there are dragons around every corner. Don't over-share internal workings of your classes, its an implementation detail, and users will read too much between the lines. This tends to lead to client code that is brittle. Document your promises, not your pathology.

throw table_exception("(? ???)? ? ???");

This topic is closed to new replies.

Advertisement