Cost of Abstraction

Started by
39 comments, last by Finalspace 7 years ago

C/C++ for-loops are overpowered for most uses

What the hell?

It's a heavily overloaded statement used for a bunch of related but different purposes - creating number sequences, iterating over arrays, iterating using arbitrary iterators, infinite loops, while loops, do-while loops, etc. It contains 3 separate expressions, each of which can (and often is) used in weird ways to get subtly different iterating behaviour, leading to code that is harder to read.

If you want to iterate over a container, a "for element in container" style loop makes more sense. The language should be able to extract the start and end position accordingly. Vectors and arrays can be included in this.

If you have some sort of iterator, then arguably a while loop fits that semantic better. C++ wedges them into the for-statement via the help of hacks like the magical default-constructed input iterator that marks an end of a sequence. It's not a natural fit.

If you want to create a number sequence, that can be a separate operation. e.g. a Range() function. Then, iterating over it is optional - which is good, because that's often suboptimal anyway, in these days of vectorisation and parallelism you don't really want explicit loops everywhere.

If you want an infinite loop then that's what 'while' is there for - not "for (;;)". Same goes for any other loop that would be a [do]-while in other languages - those constructs exist because they're clearer, semantically.

What is the correlation, in this case, between versatility/utility and being over-powered?

TBH I think I understand what you may have had in mind with your original statement. It's just that I find the wording fantastically misleading and your explanation actually further deviates from what I thought you meant. If anything, your elaboration works directly against it, hilighting how a for-loop is too specific/ill-suited for most cases, making it, effectively, underpowered. I personally find it simple and slick, and prefer it whenever possible.

Just to express what I had in mind with my reply: a for-loop is a flow control construct. It has nothing to do with power. Whether it is well or poorly suited for a task stems from the choice made by the programmer, not the construct itself. Misusing flow control is a different topic.

PS: while(1) { } generates a C4127 in VS, which needs to either be explicitly suppressed or replaced with a for(;;) { }.

That being said I digress - this discussion is only remotely related to the topic at hand.

Advertisement

Power = "the ability or capacity to do something". The for-statement in C, C++, etc., has the ability and capacity to do too many different things. That's overpowered.

Abominations like this, for example:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}

Obviously you want your language to be powerful, but it doesn't follow that each statement in it needs to be.

Reducing it down to "a choice by the programmer" ignores the fact that the language shapes the code and the thought processes.

And MS docs disagree with you, saying "trivial constants such as 1 or true do not trigger the warning". (Besides which, I'm much happier to say that a warning is erroneous than to think that 'for ( ; ; )' is a sensible construct.)

If you have some sort of iterator, then arguably a while loop fits that semantic better. C++ wedges them into the for-statement via the help of hacks like the magical default-constructed input iterator that marks an end of a sequence. It's not a natural fit.


I wouldn't call it a hack. I believe the reason C++ does this is that it's the most natural way to iterate through an array. For continguous data structures like arrays, you can get away with using a simple pointer as your iterator - in fact, I've worked with production code that does just this, including using pointers with the standard algorithms! I imagine vectors are among the most common STL data structures used in C++, so it makes sense to me that dominant iterator pattern would match the most natural way to iterate through them.

Pointer-style iterators fit well into C++'s mindset of "don't pay for what you don't use." You could abstract the whole thing with a Java-style iterator, but it would have to store both a pointer to the current element and a pointer to the last/one past element, bloating the size of the iterator. Probably not a problem on modern hardware - very slightly increased chance of cache misses on the stack aside - but what about on embedded systems? Everyone would just use the pointer-style iterators, anyway.

edit: Though, if you wanted to get rid of the extra pointer I suppose you could do something like this:

for (auto itor = std::begin(container); itor.is_valid_in(container); ++itor)
But then
1) You can't make itor a raw pointer and that cuts down on the usefulness of the standard algorithms;
2) That doesn't actually match the Java-style idiom.

Power = "the ability or capacity to do something". The for-statement in C, C++, etc., has the ability and capacity to do too many different things. That's overpowered.

Abominations like this, for example:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}

Obviously you want your language to be powerful, but it doesn't follow that each statement in it needs to be.

Reducing it down to "a choice by the programmer" ignores the fact that the language shapes the code and the thought processes.

And MS docs disagree with you, saying "trivial constants such as 1 or true do not trigger the warning". (Besides which, I'm much happier to say that a warning is erroneous than to think that 'for ( ; ; )' is a sensible construct.)

Perhaps it would be prudent to add a related footnote to your original post? If nothing else, your original statement is ambiguous - within the confines of the thread at hand I automatically related your use of the word power to execution speed. I still don't agree with you in terms of the versatility of the for-loop (IMO the misuse of anything is ultimately almost always the programmer's failing and the language should not coerce one to learn a crapton of syntax that can be easily generalized). But that's my opinion and a whole different matter.

As for the warning - it may have been fixed post-2013. Honestly I didn't check. In VS2013 the "official" solution is to use for(;;). Which, I agree, is silly at best.

Just kinda thinking out loud, but it seems to me like if you've worked yourself into some kind of corner where you have to optimize how many instructions get spit out for using for loops, then something else is already wrong

That really depends. For example, if you are checking for collisions of your object's body in a determined space (whether getting only the neighbors or not), you will use at least one loop. And if you have many objects that perform that operation, things take time. Same goes to "find object that has XYZ state".

I know these things emerge from beginners in their beginning years, trying to abstract things, because they are in the heat
I believe the real culprit here is not abstraction but over-generalization for future, currently unknown, extensions. It leads to lots of layers through interfaces, derived classes, etc, all because maybe you need to extend in such and such direction tomorrow.

(Things get worse if you try to build a framework or library, like a game engine, as you really don't know where people will extend it.)

I agree.

But be aware that they do make other programmer's lives easier and they have no runtime cost, so other programmers are going to use them whenever they make sense.

I get, abstraction is to make programmers' life easier, but to go to the point that you need to abstract a for loop? Look that it's the person that @Finalspace mentioned (in the first post) that said "normal for loops are too heavy" and teached that using a class abstraction (written in 40+ lines) in for loops is a magic bullet.

But be aware that they do make other programmer's lives easier and they have no runtime cost, so other programmers are going to use them whenever they make sense.

I get, abstraction is to make programmers' life easier, but to go to the point that you need to abstract a for loop? Look that it's the person that @Finalspace mentioned (in the first post) that said "normal for loops are too heavy" and teached that using a class abstraction (written in 40+ lines) in for loops is a magic bullet.

That's only because of the container class he created.

He could have used any other existing container, or used a simple array with the values in it. Arrays don't need any of it, and the built-in containers have their own iterators.

Look over the code a little more.

He created a new type, called "range", with a constructor taking two values to indicate the beginning and end. That's not much, but it is the creation of an entirely new type that didn't exist before.

Then he created an iterator for his type so it matches the standard interface. That means operations for getting the beginning and ending, and operations for increment (++), equality (==), inequality (!=) and dereference (*). Usually these are easy to implement, and for his new "range" type most are either one or two statements.

It isn't complex code, nor is it difficult for most programmers to understand. The nature of the language is that when you create a new data type you generally need to tell the compiler how it relates to basic operations.

If you have some sort of iterator, then arguably a while loop fits that semantic better. C++ wedges them into the for-statement via the help of hacks like the magical default-constructed input iterator that marks an end of a sequence. It's not a natural fit.


I wouldn't call it a hack. I believe the reason C++ does this is that it's the most natural way to iterate through an array. For continguous data structures like arrays, you can get away with using a simple pointer as your iterator - in fact, I've worked with production code that does just this, including using pointers with the standard algorithms! I imagine vectors are among the most common STL data structures used in C++, so it makes sense to me that dominant iterator pattern would match the most natural way to iterate through them.


This is a circular definition though. For C++, a for loop with an index is the natural way to iterate over an array because the arguably more natural way - foreach item in array - didn't exist in C or C++ until recently. Even the BASIC style loop of for counter = StartIdxValue to EndIdxValue feels more natural for the case of iterating over a whole indexed collection than having to explicitly write the termination condition yourself. In other languages explicit termination conditions are relegated to while loops, where they make more sense.

Given a whole community raised on (a) doing everything with for loops, and (b) preferring pointers to indices, it's not surprising that STL iterators got wedged into that philosophy as well. The library was developed to fit the language and the community at the time. If we'd had a foreach loop back in the 90s then I think we might have implemented them differently.

You could abstract the whole thing with a Java-style iterator, but it would have to store both a pointer to the current element and a pointer to the last/one past element, bloating the size of the iterator.


You need to store the termination condition somewhere, whether that's all in one iterator, in a temporary 'end' iterator, or as a hard-coded constant in the code - and a decent compiler will probably switch all array iteration to being the latter anyway. At that point the size of the iterator would most likely be irrelevant. This is what happens with istream_iterators - they contain an istreambuf_iterator which has a null streambuf pointer, which only exists so that it can compare equal to the other iterator once it's exhausted. Hopefully it gets optimised away!

But my point wasn't to argue that all iteration should be replaced with iterator objects. The point was that using one single statement to handle all these different iteration types is not ideal.

For example, if you are checking for collisions of your object's body in a determined space (whether getting only the neighbors or not), you will use at least one loop. And if you have many objects that perform that operation, things take time. Same goes to "find object that has XYZ state".

Even in those situations, IMO, there would be higher level optimizations to make first that would likely negate the need to shave tiny fractions of time off the specifics of how iterating is implemented (assuming you even gain anything that way at all after compiler optimizations are considered). Things like collisions or searches can take steps to reduce the number of things you need to iterate over in the first place.

Given a whole community raised on (a) doing everything with for loops, and (b) preferring pointers to indices, it's not surprising that STL iterators got wedged into that philosophy as well. The library was developed to fit the language and the community at the time. If we'd had a foreach loop back in the 90s then I think we might have implemented them differently.


That's probably true. I think that's a good thing, though. A language and its dominant idioms should play well together. Why shouldn't a language evolve to synergize with how it's used in practice?

foreach item in array - didn't exist in C or C++ until recently.
Euhm,

int data[100];
for (int *p = data; p < &data[sizeof(data)/sizeof(data[0])]; p++) { ... *p ... }

exists for a very long time, afaik. The only problem here could be that &data[100] may be out of range for int *, but afaik that got fixed in the mean time too.

Of course, any compiler that performs constant calculations movement away from the loop will rewrite the normal integer index iteration to something like this.

This topic is closed to new replies.

Advertisement