That simplistic model does not match modern Out of order execution x86/x64 processors. Instruction fetching, decode and branch prediction is done before micro-ops are scheduled to be dispatched on one of several out of order dispatch ports.
That is why I spoke only about cycles. The same pitfalls apply to both direct and indirect memory access, so it is not worth mentioning.
Specifically, JohnnyCode stated that adding an offset to get an address is necessarily slower than not, which is the only claim I refuted. You didn’t even mention data-cache fetches etc. Because they (and everything else you mentioned) apply whether or not offsets in the address operands are present.
Those dispatch ports don't have a way of guessing what their input will be until it is assigned. So the trick of always precomputing a load offset doesn't work.
The load offset is never precomputed. It is computed after fetching and decoding, prior to the next cycle at which execution will take place. If no offset is provided, it will still wait until the next cycle to execute.
L. Spiro
The fact is, indexing an array can actually have a 1 clock cycle penalty compared to dereferencing a bare pointer or pointer plus constant offset, if the address is < 2048 (I think this is offset from the memory segment). See table Table 2-12, in the optimization reference manual referenced earlier.
The address is not computer prior to execution. It cannot be, because the input registers are not available then. It is therefore presumably computed on the first cycle of execution of the load.
Also, the fact that addressing memory is part of the instruction, is irrelevant. Firstly, because by the execution stage an instruction can be split into micro instructions that a scheduled out of order, and secondly because the size of the data being fetched also imposes a latency penalty, even though that's not in any way apparent in x86/x64 assembly.
For completeness it's also worth repeating/clarifying that computing an offset can itself cause stalls, but in comparing with linked lists this tends to be less, because fetching the pointer from memory in a lined list introduces the same kind of stalls or worse. In fact the data dependency in computing successive node addresses can prevent or hinder the CPU in working on multiple values in the container at the same time, unlike for an array.