The wrong way to count lines of code

Ed Welch · 2015-10-30T19:41:18

I was just wondering why some projects report such inflated figures for lines of code (google have claimed to have writen a billion lines of code) I downloaded a tool called cloc and discovered that it automatically doubled the amount of code in my project (c++) - normally, I just search for ";" - a much easier way to do it. It looks like to me that they are counting everything that isn't a comment and isn't white space as a line of code. A single open curly brace gets counted as an entire line of code. This is obviously wrong. If you format your code for readability you will double the lines of code, compared to some one who writes code in a more compact style - even though it's essentially the same code. Also, every file gets counted, no mater what it is, stuff auto-generated by the IDE and even html files get counted.

Coding Horrors Community

Started by Ed Welch October 10, 2015 11:45 AM

43 comments, last by Luckless 8 years, 5 months ago

l0calh05t

1,829

October 27, 2015 11:24 AM

LOC is clearly an outdated metric. Its weird that people go to the effort of counting lines while each counting differently.
I'd suspect making a .tar.gz of all source files and comparing size in bytes might be more useful for comparing project complexity, but still flawed. At least it would condense obvious bloat like copy-paste and extraneaous whitespace, which you could then find out about if the .tar significantly increases in size but the .tar.gz not.

Hah, that's a good metric: compression ratio.

If one project has a "better" compression ratio than another, then it's probably full of C&P, or other forms of repetition

love the idea!

Ravyne

14,306

October 27, 2015 09:09 PM

Lines of code deleted is clearly a useful metric

I don't think even that is good beyond question -- you can add lines of code and still reduce complexity, likewise you can take lines away and still increase complexity; at least, if we're talking about complexity as the mental load required to understand it. If reducing the number of lines of code in a program is good, I suggest to you that the increased quality thereof has less to do with the number of lines left, and everything to do with the fact that while we trust any junior developer to *add* significant amounts of code, usually only more-experienced devs are entrusted to *remove* significant amounts of it -- or, at the very least that the act of removing lines of source code necessarily requires a more-complete understanding of both the problem and the existing solution. In other words, the second-whack at the problem is better because of better understanding, and that it might result in fewer lines is a symptom, rather than the cause (same as that it might result in more lines).

We reason about code at several levels -- at a systems-scope where we need to reason about how our processes interact with other processes, at the global scope, where we have to reason about how each module in our code interacts with rest of the whole, at the module level where we have to reason about the sub-components of our module working together or how our modules interact with specific other modules in isolation from the rest of the system, at the class, function, or even smaller levels. The goal of good code is that, at each level, exactly the information you need to reason about those relevant things is clear -- not more, not less -- that's the ideal. I don't want more code, I don't want less code -- I want exactly the right amount of code, functions, classes, modules, and binaries to facilitate that understanding.

That said, sudden swings towards what seems like too many or too few lines of code, especially at inappropriate times in a programs life-cycle can certainly indicate code and design-quality issues. A sudden ballooning of source code might indicate, for instance, an over-reliance on inheritance vs. composition -- but that's really indicated by the first-order derivative of LOC, not LOC as a purely quantitative measure, and this is also usually better tracked and reasoned about at the module level, not at the whole-program level. I submit that a graph of LOC per-module, over time is far more informative than knowing total line-count at any given time. Furthermore, you need to know where to look for best results -- if you're scoped too broadly its really difficult to separate things that should concern you from totally normal background noise, and likewise too narrowly and you will miss issues altogether.

[EDIT] I probably should soften my stance a little -- what I think we all mean to say in one way or another is that all quantitative measures of our source code -- LOC, numbers of macros, loops, functions, classes, modules, dependencies, etc -- can all provide insight into what's going on with your code base if you look at the data in the right way, and with knowledge of the design transformations that are happening in time with those measurements. These and more can be useful *metrics* that inform hypotheses about potential code/design smells that can be validated or debunked through investigation or testing. What any of these things are not, is any kind of quota that we should derive badges of honor directly from.

throw table_exception("(? ???)? ? ???");

Brain

19,058

October 27, 2015 09:55 PM

I find an interesting metric alongside others is number of git commits and pushes.

This is purely because I find that people who don't commit and don't push are the first to complain when they can't rollback their changes or suffer a hard disk failure...

Having a metric publicly known about number of commits and pushes doesn't encourage cleaner code but ensures this doesn't happen as the metric is a constant reminder: "did you remember to commit and push today?"

Working on a private branch is no excuse. If you're working on a business project there's no need to EVER secretively hoard your work on your local disk...

Edit: just to clarify, the last place I worked, I bought in version control. The guy I worked with, my boss, believed version control was renaming the file to zzzbackup or such until the whole folder was filled with zzzzz files. He never pushed but as he was my boss he wouldn't listen and it really ground my gears. In my current job I'm in charge of the dev team so it's easier to "encourage" people to commit and push often... Phew!

Games/Projects Currently In Development:
Discord RPG Bot | D++ - The Lightweight C++ Discord API Library | TriviaBot Discord Trivia Bot

grumpyOldDude

2,776

October 28, 2015 11:03 PM

As been said, clean code, Smart algorithms, optimising functionalities/libraries use, refactoring all reduce code size

But another thing that with experience i have learnt to place in my code are pieces of analytical code to help analysis and debugging better for the future. Though these code pieces massively increase code size, it doesn't increase complexity and doesn't affect the output....

...but makes life much much easier down the line when issues arise.

Again it makes any form of LOC metric meaningless

can't help being grumpy...

Just need to let some steam out, so my head doesn't explode...

Luckless

3,283

October 30, 2015 07:41 PM

LOC is a useful metric, but its usefulness is very limited and quickly becomes meaningless.

Far more interesting and useful metics can be generated through peer review and points or grading systems.

LOC on its own really doesn't tell you much, especially when you account for things like switching to third party code on a project. Some code that I had years ago was a few thousand lines of C++ that made a bit of use of some external libraries. I decided to rewrite it while learning Python and made use of a number of highly supported and recommended libraries. The code ran just as fast, but what I actually wrote myself was now only a few dozen lines of Python with more of the 'grunt work' for things passed off to efficient and reusable opensource libraries solving the same problems I had previously had to do myself due to having lacked knowledge and trust of suitable libraries when I had originally written it.

Old Username: Talroth
If your signature on a web forum takes up more space than your average post, then you are doing things wrong.

The wrong way to count lines of code

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

The wrong way to count lines of code

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines