It is unlikely the compiler would actually choose an int16_t, that may have been a poor example . On the other hand, the optimizer is allowed to do quite a few things which might surprise you if you look at the generated assembly. Usually the peephole optimizer can be allowed, under verified safe conditions, to modify types as it see's fit. A simple loop as described is an obvious case that it can look at and decide if the size can be modified to provide smaller or faster code depending on your compile settings. Typically the constraints are that the code does not ever assign to the loop value and that the loop declaration is idempotent. In such a case, on a 64 bit target, it can change from 64 to 32 bit or the other way around as most appropriate. Like I said, using 16 bit was probably a bad example.
I think you misunderstood me. Disclaimer: I have not actually looked at any compilers to verify my claims here so I might be dead wrong, but the rest of this post describes my current understanding/impression/whatever.
I see your point now and while I agree I also think it is a bit of a strawman argument. When the architectural differences are that large you end up having to write completely different code for the platforms. You just can't use the least common denominator code in such a case because it will typically perform horribly on one or the other platform until you just bite the bullet and write code to the target specifically. Some things will be shareable, but I just don't believe it would be enough to justify the added work involved to attempt it beyond the trivial.