The wrong way to count lines of code

Started by
43 comments, last by Luckless 8 years, 5 months ago
Off by a factor of 2 is within the ballpark for the usefulness of LOC.
1M LOC and 2M LOC are both "big".
1K and 2K are both "small".

Personally, if there's a 10 line comment above a function, it should be included in the LOC stat, as it's part of the code that's expected to be read by a maintainer.
But then this gets into a grey area when EVERY function has 10 lines of stupid XML markup comments above it for autogenerated docs. That's not really part of the code anymore - especially if your IDE automatically hides it..

As for google, they've got between 10k and 20k programmers (Samsung has over 40k!). They could write 1 billion LOC about once every 4 years.
Advertisement

What makes it even worse is that sometimes more lines can mean clearer code that will be easier to understand than the shorter version: sticking to doing one operation per line, using proper if else branching instead of ternary operators, less macro magic, this all makes lines count go up but makes the code more understandable.

Not really. It's a useful metric- at least it would be if they implemented it properly. Gives you a ball park figure of how complex a project is.

It doesn't. I fully agree with WozNZ

Reminds me... back in the days when i was in uni, after a course work was accessed i read a feedback note from the tutor which says something like "if i didn't know what the course work was about i would have thought your code was implementing a space rocket" When i checked the code of some guy who got near perfect score i realised what the feedback meant. His code was about 1/20th the size of mine because it had more intelligent algorithms and thus more compact. Since then i always strive to make my code smarter (also creating better functionalities) and never took huge lines of code to indicate more complexity.

Of course no matter how smart your algorithms are, it will still grow in size for large projects.

can't help being grumpy...

Just need to let some steam out, so my head doesn't explode...

So, multiply line count with the IQ of the author?

So, multiply line count with the IQ of the author?

Well said, but it was due me being naive rather than low IQ, because i learnt my lesson and took it to heart after that assessment and my scores improved significantly

can't help being grumpy...

Just need to let some steam out, so my head doesn't explode...

I meant it in the generic form. Smart people write short programs, while less-smart people solve it in more lines of code. I wouldn't be surprised if you end up for largely equivalent estimates for the same problem with different people.

Off by a factor of 2 is within the ballpark for the usefulness of LOC.
1M LOC and 2M LOC are both "big".
1K and 2K are both "small".

Yes, but being off by a factor of 1.1 is better than being off by a factor of 2.


Yes, but being off by a factor of 1.1 is better than being off by a factor of 2.
You missed my point. The metric itself is so fuzzy that accuracy in the measurement largely doesn't matter.

Say you've got a laser which can tell you if an object's distance from you is within 5 brackets: larger than 1m, 1m to 10cm, 10cm to 1cm, 1cm to 1mm, or less than 1mm. Most of the time, if you double the distance of the object, you'll still get the same result from the laser because it takes a factor of 10 (not 2) to jump between brackets.

LOC is quite similar, most of the time you're categorizing projects based on the log10 of the LOC value, not the log2 :)

Counting the semicolons instead of newlines is also prone to inaccuracies, as people have mentioned above. Comments can be a crucial part of the code, vital for maintainers to read and understand, just like any other part of the code - semicolon count ignores them. Many code constructs are quite complex but don't use any or many semi-colons -- macro-based code-generation, lambdas, functions, etc... Other simple constructs are semi-colon heavy, such as for-loops (2) vs while loops (0). Style can also influence the count -- some people use commas to declare multiple variables at once, whereas other people declare one per line.

I've also seen some projects that use an 80-character section delimiter in their code made up of semicolons :lol:

/*;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;*/

It would actually be mildly interesting to perform different "LOC" metrics (such as semicolon count) for a large selection of different projects and see how the metrics vary. You could find out if there's a correlation between semicolons and lines in general, or if the relationship varies randomly from project to project. Maybe you could even use relationships between different metrics as a guess to the style of the code :)

The only place where lines of code are a useful is when refactoring. Lines of code removed (while all tests continue passing of course) is a great metric.

What about #code statements VS #assembly moves? tongue.png
"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

This topic is closed to new replies.

Advertisement