Lines of code language comparisons

Started by
37 comments, last by AzureBlaze 9 years, 6 months ago


But it can't possibly be a completely ignorable factor, seeing as more code clearly means more time spent coding

No, that isn't clearly true.

How is that not clearly true?

Advertisement

Just to simplify the discussion, the answer is NO.

Well, I'm sure you know what you're talking about but I need something more than your word to go by.

Wow, you sure are an all-or-nothing, black-or-white kinda guy, huh?

You asked if the level of importance of code size can be quantified and measured accurately. I said no, because it isn't possible to do so in the way you're asking for it to be done; certainly not "accurately."


This is a far cry from saying it's impossible to know if code is important. You should look up the nominal, cardinal, and ordinal properties of sets. It's entirely possible to know something has an attribute without being able to quantify that attribute.

For instance: I can assert that you have the attribute "human." But can I quantify it? Does it mean something to be 0.3854 human? Not really.



I thought this was something a 10-year old would understand?


Watch your attitude. We don't need personal sniping.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Ok, guys nevermind. I'm gonna make a video about this, instead so that I can explain what I'm talking about. Thanks anyways and I'll update the OP as well.

@ApochPiQ: No personal sniping intended, I'm just very confused. Removed the line.

Anyways, to answer your "human" argument. When you said that I was bad at something and I asked "how bad", you wouldn't be able to tell me a number from 1-10 perhaps, but you would still be able to point out what those bad things are and how much it roughly affects my ability to do whatever I'm doing. Which in turn means that it's possible to map out a range of specific things that makes someone good or bad or, in the case of my question, a list of things that makes a given factor of development important or unimportant.

Thus, you just made a measurement of it. Not an incredibly accurate one down to some numerical decimal value, but an abstract measurement where each "integer" of the "scale" is represented by some abstract action or behavior that constitutes whether that factor had a big or small impact on your overall workload. In some case, I'm just open to the possibility that a numerical value could be used to represent it, but not necessarily and probably not for the most part.

Anyways, doesn't matter. I'll make the video.

Of course less typing means time saved. Not that much time (since a lot of time is spent on other things), but still.

The problem is that less typing implies you are writing less information down, which means:

-The compiler has less to work with for optimization/debugging/error detection purposes

-Understanding the code might be more difficult because some information is not written down, and the information that IS written down, is more compressed. Compare to using 5 characters long variable names for everything to save on typing.

Shorter code can be good or bad. It depends on what you remove or change to make it shorter. You can either pass less information to the compiler, pass less information to whoever reads the code, you can represent the information in a different shorter form without removing anything... But whatever you do, it has advantages and disadvantages.

Its impossible to measure the effect of code lenght on the objective value of a language because there are too many dependencies. Maybe you can compare programming in language A against programming with one hand only in language A.

o3o

But it can't possibly be a completely ignorable factor, seeing as more code clearly means more time spent coding

No, that isn't clearly true.
How is that not clearly true?
More code equals more time typing.

If I'm regurgitating code flat-out, I could probably write 10000 LOC a day... Which means that Braid is just two weeks work, right?
Obviously, that's not how things work. Jon Blow did not just sit down and churn out 100k loc non-stop and then have a game at the end of it.

Your average professional programmer is more likely to produce about 100 LOC per day, which indicates that raw typing is only about 1% of their actual job.
In any case of trying to optimize a process, you don't go after the parts of it that only take up 1% of the time first.

For any particular task, a productive language doesn't necessarily result in less lines, it results in less careful focused thinking required to write and read the code.
The majority of a programmer's job is actually reading other people's code, and then modifying code. Writing new code from scratch comes way after those two activities.

For an extreme example, compare one line of C# LINQ to the equivalent x86 Assembly.
In the former, you can have one very readable line that sums all the numbers in a linked list. In the latter, you've got pages of what may as well be hieroglyphics. Even someone who's an expert at assembly will take minutes to pour over those hieroglyphics and piece together their purpose / meaning. If you then had to modify the ASM, it would be a complex task requiring expert skills and a lot of time.
It would be much more productive if you could instead just modify some high level C# code.

To avoid jumping to conclusions here though, let's say you've got another task that requires carefully specifying the byte-by-byte memory layout of a complex, compressed data-structure of some kind. You actually care about the exact placement o your data in RAM/address-space here.
For this task, Lua is just out of the question (doesn't offer that capability, without modding the Lua VM at least). C# has the capability, but the code becomes extremely verbose and ugly. C has the capability by default and the code is simpler.
Jon Blow's game-state replay system is a good example of one of these complex "low-level" tasks, where a lower-level language like C ends up being a better fit than a higher level one like C#.

In both those examples, it's not the LOC count that makes a difference, it's -
• How readable is it? How long does it take someone to understand the workings of the algorithm embodied in the code?
• How fragile is it? How likely is it that a bug will be introduced when someone modifies the code (probably due to them failing to read an important detail)?
• How flexible is it? Can you edit the algorithm later without having to rewrite everything from scratch?
• How correct is it? Can a peer review prove formal correctness? Are there assertions for every assumption made by the coder, to detect any bugs that may occur later? If it is identified as incorrect/buggy later, how hard will it be to diagnose potential issues?

To avoid jumping to conclusions here though, let's say you've got another task that requires carefully specifying the byte-by-byte memory layout of a complex, compressed data-structure of some kind. You actually care about the exact placement o your data in RAM/address-space here.
For this task, Lua is just out of the question (doesn't offer that capability, without modding the Lua VM at least). C# has the capability, but the code becomes extremely verbose and ugly. C has the capability by default and the code is simpler.
Jon Blow's game-state replay system is a good example of one of these complex "low-level" tasks, where a lower-level language like C ends up being a better fit than a higher level one like C#.


As a case study in C# binary manipulation and pretty gross lines-of-code obfuscation, here's some of my ARMv7 assembler/disassembler code.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ARMv7.Instructions
{
    // Page 712, 718 (done)
    public class SUB_reg : Instr
    {
        public static readonly Encoding T1 = Encoding.T16(0x1A00);
        public static readonly Encoding T2 = Encoding.T32(0xEBA00000);
        public static readonly Encoding A1 = Encoding.A32(0x00400000);

        public ConditionCode Cond = ConditionCode.AL;
        public bool S;
        public Reg Rn;
        public Reg Rd;
        public ImmShift ImmShift;
        public Reg Rm;

        public SUB_reg(Encoding e) : base(e) {}

        public SUB_reg(ConditionCode cond, bool s, Reg rd, Reg rn, Reg rm, ImmShift immShift)
        {
            Cond = cond;
            S = s;
            Rd = rd;
            Rn = rn;
            Rm = rm;
            ImmShift = immShift;
        }

        internal override void DecodeT16(DecStream s)
        {
            s.X7.R3(out Rm).R3(out Rn).R3(out Rd);
        }

        internal override void DecodeT32(DecStream s)
        {
            s.X11.B(out S).R4(out Rn).X1.ISR4(out ImmShift, out Rd).R4(out Rm);
        }

        internal override void DecodeA32(DecStream s)
        {
            s.C(out Cond).X7.B(out S).R4(out Rn).R4(out Rd).IS(out ImmShift).X1.R4(out Rm);
        }
        
        internal override Encoding EncodeThumb()
        {
            AssertThumbAlways(Cond);
                    
            if (!ImmShift.IsUsed && Rm <= Reg.R7 && Rn <= Reg.R7 && Rd <= Reg.R7 && S == true)
                return T1.EncStream.X7.R3(Rm).R3(Rn).R3(Rd).Encoding;

            return T2.EncStream.X11.B(S).R4(Rn).X1.ISR4(ImmShift, Rd).R4(Rm).Encoding;
        }

        internal override Encoding EncodeARM()
        {
            return A1.EncStream.C(Cond).X7.B(S).R4(Rn).R4(Rd).IS(ImmShift).X1.R4(Rm).Encoding;
        }

        public override string ToString()
        {
            return Assembly.Str_S_Cond_R4_R4_R4_ImmShift("SUB", S, Cond, Rd, Rn, Rm, ImmShift);
        }
    }
}
Before I refactored the living f*** out of it using some evil struct tricks, instead of this:


s.C(out Cond).X7.B(out S).R4(out Rn).R4(out Rd).IS(out ImmShift).X1.R4(out Rm);
Without any helper functions, that code looks like this:


Cond = (ConditionCode)((encoding >> 28) & 0xF);
S = ((encoding >> 20) & 1) == 1;
Rn = (Reg)((encoding >> 16) & 0xF);
Rd = (Reg)((encoding >> 12) & 0xF)
ImmShift.Imm5 = (encoding >> 7) & 0x1F;
ImmShift.Type = (encoding >> 5) & 3;
Rm = (Reg)(encoding & 0xF);
I also had an intermediate form which looked like this:


FieldsA32.cond_xxxxxxx_S_R4_R4_imm5_type2_x_R4(encoding, out Cond, out S, out Rn, out Rd, out ImmShift.Imm5, out ImmShift.Type, out Rm);
The downside of that was that there are too many unique field encoding patterns, so I was ending up with too many unique Fields* methods. This quickly becomes unmaintainable.



The number of lines of code dramatically differs, but the language and actual operations being performed are the same. I don't know if this particular example reinforces anyone's particular viewpoint in this thread or not. I don't view lines of code as something important. I care about maintainability.


So using "sloppy code" as an example: "What could I learn that would make my code less sloppy and redundant, effectively causing me to write fewer lines of code because I'm now writing cleaner and less redundant code. Sometimes, clean code could mean more coding, but this specific example isn't talking about those cases. But even if we were, then it still doesn't answer the question "to what extent does cleaner code cut down cost". For that, one needs to look at each factor individually in deeper detail.

Your original argument was for fewer LOC meaning that a project could be developed faster. However sloppy code won't neccasarilly make a program slower to write but, it could make it harder to maintain. This isn't much of a problem in games coding as the majority of code written in games development is thrown away and forgotten about once the project is finished. Obviously not true for engine code but the engine could be third party and even in studios that develop their own engines you will have different programmers working on the engine and gameplay and you will usually be able to see a difference in quality where gameplay code is rushed to meet a deadline whilst the engine has been carefully crafted.

In a banking application on the otherhand it could have to be maintained for the next 20 or so years and it is here that having neat clean code helps but, it is the reason that we have to write Unit tests, Integration Tests, Acceptance Tests and automation frameworks meaning that we end up writing even more code to make the maintenance easier.

You can spend time to make your code cost less line/character, and than spend even more time to make it look good.

For example:

http://uguu.org/sources.html

definitely shorter than normal c code, also definitely more time coding unless you're copying it by hand.

and a day of eternity to maintain it.

I prefer longer code than shorter code, as it will be easier to read and understand. If you are doing any real programming job, you will always spend more time thinking, reading than actually typing.

Ancient Chinese literature use a different language then the spoken one. As paper was expensive, they compressed the language so they can talk about more nonsense with less money. The result is some cryptic dialect that took us years to learn and still very difficult.

If George RR Martin zip his novel in his head before writing it down and have the publisher decompress it, could he finish The Winds of Winter quicker? He'll need less finger movement. Or maybe he should learn ancient Chinese?

I think LOC is only meaningful while on logarithm scale, to help estimate the size of a project. There's definitely difference between a 1K LOC project and a 100K LOC project, but doesn't mean 200K LOC project will be always more complex than a 100K project.

Other than this, the only use of LOC is to intimidate non-programmers. Telling your designer that feature will cost 1M LOC (citation needed) will probably shut him up.

This topic is closed to new replies.

Advertisement