Back to Coding Horrors

The wrong way to count lines of code

Coding Horrors Community

Started by Ed Welch October 10, 2015 11:45 AM

43 comments, last by Luckless 8 years, 5 months ago

Ed Welch

1,013

Author

October 10, 2015 11:45 AM

I was just wondering why some projects report such inflated figures for lines of code (google have claimed to have writen a billion lines of code)
I downloaded a tool called cloc and discovered that it automatically doubled the amount of code in my project (c++) - normally, I just search for ";" - a much easier way to do it.
It looks like to me that they are counting everything that isn't a comment and isn't white space as a line of code. A single open curly brace gets counted as an entire line of code. This is obviously wrong. If you format your code for readability you will double the lines of code, compared to some one who writes code in a more compact style - even though it's essentially the same code.

Also, every file gets counted, no mater what it is, stuff auto-generated by the IDE and even html files get counted.

WoopsASword

963

October 10, 2015 04:41 PM

That's why nobody cares about lines of code in the first place. (And if you do, please stop)

The only proper use of counting lines of code is to determine if your function or class is too big.

Ed Welch

1,013

Author

October 10, 2015 05:42 PM

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.

MarkS_

3,509

October 10, 2015 06:39 PM

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.

It really isn't. Brackets, for instance, are arbitrary. They make the code cleaner, but do not translate to machine code. There is quite a bit in all modern languages that exists for aesthetics and code organization that do not have any effect on the final machine code.

This:


if(...){}

is equal to this:


if(...)
{
}

Your code really isn't any more compact in the first case.

Ed Welch

1,013

Author

October 10, 2015 06:59 PM

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.

It really isn't. Brackets, for instance, are arbitrary. They make the code cleaner, but do not translate to machine code. There is quite a bit in all modern languages that exists for aesthetics and code organization that do not have any effect on the final machine code.

This:
if(...){}
is equal to this:
if(...)
{
}
Your code really isn't any more compact in the first case.

That's why I said if they implemented it properly.

If you count each semi-colon as a line of code then you get the same number no matter what your style code is

MarkS_

3,509

October 10, 2015 07:03 PM

[edit] Never mind... That was a dumb question.

SeanMiddleditch

17,596

October 10, 2015 07:08 PM

If you count each semi-colon as a line of code then you get the same number no matter what your style code is

Macros, code generation, abuse of the comma operator, use of temporaries, etc. all affect the number of the semi-colons in code. In some cases, more semi-colons means _less_ complex code (as you're breaking up complex expressions into simpler ones).

Sean Middleditch – Game Systems Engineer – Join my team!

Ed Welch

1,013

Author

October 10, 2015 07:20 PM

If you count each semi-colon as a line of code then you get the same number no matter what your style code is

Macros, code generation, abuse of the comma operator, use of temporaries, etc. all affect the number of the semi-colons in code. In some cases, more semi-colons means _less_ complex code (as you're breaking up complex expressions into simpler ones).

The metric is used to get a ball park figure it doesn't need to be 100% accurate to be useful and those are just corner cases - they don't happen very often.

Also, is a relative comparison, take two big projects and the corner cases will work out to be roughly even. Even if it's 10% inaccurate it's good enough.

WozNZ

2,010

October 10, 2015 10:18 PM

To what end is it a good metric though.

It can't show complexity. A well written very complex system can come in far less lines than a badly written more trivial system

Things like lines produced per day is also meaningless. This far in to my career (decades long) I find I write less code. I sit and think longer and refactor and rewrite until I have the cleanest code I can get. I will also refactor out code duplication which means sometimes addition of functionality can reduce line count.

You can't actually infer anything meaningful from lines of code apart from the line count. This only use that would serve is if your IDE has game like achievements :)

Much more meaningful metrics would be

- Number of functions

- Average line count per function

- Min/Max line count for functions

Let you see how much refactoring is required

Sik_the_hedgehog

3,003

October 11, 2015 12:07 AM

I swear when I saw "doubled" my first thought was "it counted CR LF as two newlines". Also hunting for semicolons isn't accurate either, since it doesn't take for into account. Also some languages (like javascript) are somewhat loose on where semicolons are required, and this would also exclude stuff like preinitialized structures.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

The wrong way to count lines of code

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

The wrong way to count lines of code

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines