About SIMD processors

Started by
3 comments, last by NightCreature83 11 years, 4 months ago
Do SIMD processors have bigger registers? I mean they still have 8 but bigger than 32bit...? I can't manage to understand,because even on wikipedia,it shows that a normal processors puts a number in every register and then does calculations on them.But a SIMD might store even 3 numbers on a single register,3 on another and so on,and do calculations on them.

So are SIMD registers bigger? What does that mean? are they bigger than 64bit?! What does that say about 64 bit processors?
Advertisement
I guess it means that a SIMD thing can hold lets say a 3 component vector in one register, with another in other, ad then add them and put the result in lets say the first one.

I think it goes like that.
edit: Oh and if its 64 bits, it might treat it as 2 ints or lets say 4 shorts etc. so its flexible.

o3o

Typically they're 128 bit registers, so they can operate on 4 floating point numbers at a time or two double precision numbers. Link: http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
32 bit processors have 8 32 bit registers.
When 64 bit processors they out number of registers was increased to 16 and sizes increased to 64 bits.
Processors also come with SSE, which gives extra 8 128 bit registers (16 registers on 64 bit CPU).
Since Sandy Bridge (or maybe little earlier than that) AVX is available, which upgrades all those 128 bit registers to 256 bits.
On a 64 bit CPU registers look something like the following:
64 bits : RAX, RBX RCX RDX (16 of these)
32 bits : EAX EBX ECX EDX (there are 8 of these)
16 bits : AX BX CX DX
8 bits : AH AL BH BL CH CL DH DL
All of the 32 bit registers map in to the corresponding 32 lower bits of the 64 bit registers, and so on downwards until you hit AH and AL which combined form AX .So RAX = 32bits + EAX, so EAX = 16 bits + AX, AX = AH + AL. ( http://www.eecg.toro...es/x86regs.html )
The same goes for SIMD registers the AVX/SSE.MMX, so QMM0 = 128bits + XMM0.

Here is a chart.

The reason it is useful to know this is when you have to debug a release build, which most of the time means you are staring at Assembly. And in optimised builds clearing only the lower x bit of a previously set register can be achieved by writing to the corresponding register. Also bear in mind that most of the time a optimizing C++ compiler will avoid using FPU instruction as the SIMD ones are faster, even if it has to load between FPU and SIMD registers.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

This topic is closed to new replies.

Advertisement