The wrong way to count lines of code

Started by
43 comments, last by Luckless 8 years, 5 months ago

I was just wondering why some projects report such inflated figures for lines of code (google have claimed to have writen a billion lines of code)
I downloaded a tool called cloc and discovered that it automatically doubled the amount of code in my project (c++) - normally, I just search for ";" - a much easier way to do it.
It looks like to me that they are counting everything that isn't a comment and isn't white space as a line of code. A single open curly brace gets counted as an entire line of code. This is obviously wrong. If you format your code for readability you will double the lines of code, compared to some one who writes code in a more compact style - even though it's essentially the same code.

Also, every file gets counted, no mater what it is, stuff auto-generated by the IDE and even html files get counted.

Advertisement

That's why nobody cares about lines of code in the first place. (And if you do, please stop)

The only proper use of counting lines of code is to determine if your function or class is too big.

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.


It really isn't. Brackets, for instance, are arbitrary. They make the code cleaner, but do not translate to machine code. There is quite a bit in all modern languages that exists for aesthetics and code organization that do not have any effect on the final machine code.

This:


if(...){}
is equal to this:


if(...)
{
}
Your code really isn't any more compact in the first case.

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.


It really isn't. Brackets, for instance, are arbitrary. They make the code cleaner, but do not translate to machine code. There is quite a bit in all modern languages that exists for aesthetics and code organization that do not have any effect on the final machine code.

This:


if(...){}
is equal to this:


if(...)
{
}
Your code really isn't any more compact in the first case.

That's why I said if they implemented it properly.

If you count each semi-colon as a line of code then you get the same number no matter what your style code is

[edit] Never mind... That was a dumb question.

If you count each semi-colon as a line of code then you get the same number no matter what your style code is


Macros, code generation, abuse of the comma operator, use of temporaries, etc. all affect the number of the semi-colons in code. In some cases, more semi-colons means _less_ complex code (as you're breaking up complex expressions into simpler ones). smile.png

Sean Middleditch – Game Systems Engineer – Join my team!

If you count each semi-colon as a line of code then you get the same number no matter what your style code is


Macros, code generation, abuse of the comma operator, use of temporaries, etc. all affect the number of the semi-colons in code. In some cases, more semi-colons means _less_ complex code (as you're breaking up complex expressions into simpler ones). smile.png

The metric is used to get a ball park figure it doesn't need to be 100% accurate to be useful and those are just corner cases - they don't happen very often.

Also, is a relative comparison, take two big projects and the corner cases will work out to be roughly even. Even if it's 10% inaccurate it's good enough.

To what end is it a good metric though.

It can't show complexity. A well written very complex system can come in far less lines than a badly written more trivial system

Things like lines produced per day is also meaningless. This far in to my career (decades long) I find I write less code. I sit and think longer and refactor and rewrite until I have the cleanest code I can get. I will also refactor out code duplication which means sometimes addition of functionality can reduce line count.

You can't actually infer anything meaningful from lines of code apart from the line count. This only use that would serve is if your IDE has game like achievements :)

Much more meaningful metrics would be

- Number of functions

- Average line count per function

- Min/Max line count for functions

Let you see how much refactoring is required

I swear when I saw "doubled" my first thought was "it counted CR LF as two newlines". Also hunting for semicolons isn't accurate either, since it doesn't take for into account. Also some languages (like javascript) are somewhat loose on where semicolons are required, and this would also exclude stuff like preinitialized structures.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

This topic is closed to new replies.

Advertisement