the i486 added superscalar pipelining.
Superscalar means >1 instruction issue per cycle. So Pentium is superscalar, but 486 is just a pipelined scalar.
For most users the CPU’s OOO core was effectively idle much of the time, even under load.
It would be useless to have OoO if that was true. They mainly idle in cases of branch misprediction, cache miss, lack of parallelism in instruction stream or specific instruction mix (say, only integer instructions). Second thread could supply additional instructions to fill available compute resources.
To help give a steady flow of instructions to the OOO core they attached a second front-end.
None of SMT processors has entire frontend duplicated.
Mainly queues, TLBs and tags here and there. Decoder is usually shared between threads and accessed in alternating cycles, or coarser granularity.
In case of P4, both decoder and trace cache are shared.