Sign in to follow this  
nuclear123

Out Of order execution

Recommended Posts

im curious as to why i see assembly code like this

mov eax,DWORD PTR[0x3434]; <--- recieve value from memory and store in register
add eax,eax; <--- add value and store result in eax

is this inefficient code when people do this? for example, the CPU will need to stall to allow the latency of sending data from memory to a register. Either that or this code will yield incorrect results due to the value not being in a register yet. Anyways, would it be more efficient to write this code out of order, for example


mov eax,DWORD PTR[0x3434]; <--- recieve value from memory and store in register
.....
......
.........
...........
add eax,eax; <--- add value and store result in eax

would this cause the more efficient use? by the time u get to the add instruction, you will already have the value loaded into the register and good to go. -thx

Share this post


Link to post
Share on other sites
It entierly depends on the processor.
A regular desktop processor does Out-Of-Order instruction re-scheduling. It has a large queue of instructions that it can schedule at any one time. If it can move them around, it will try to remove as many stalls as it can.
Something like a netbook Atom processor is In-Order, and won't reschedule it, resulting in a stall.
Then there is other technology, like hyperthreading. Your thread may stall at the mov, but the other active thread on this core may have instructions that can run.

Share this post


Link to post
Share on other sites
Quote:
Original post by nuclear123
would this cause the more efficient use?
It's a superset of an NP-complete problem.

Compilers try to optimize this for small cases, but general optimal solution isn't known. Even a simpler subset of the problem is very hard to solve.

Share this post


Link to post
Share on other sites
As explained above, OOO execution is managed by hardware, and not the programmer.

Given you're programming in a higher language, any standard, well established compiler would do software ILP (re-scheduling), with the consideration of any potential dependencies/hazards.

If you don't write in a higher language, but actually hand-code in assmebly language, then what you suggested may optimize the pipeline throughput, but keep in mind that you would have to be highly aware of the instructions you interject as well, taking in account the various stall penalties between different types of instructions.

Share this post


Link to post
Share on other sites
It's also worth noting that it's generally impossible to schedule instructions far apart enough to hide the length of stall you get from actually reading main memory - it could be hundreds of clock cycles. To avoid those stalls you need prefetching either manual with prefetch instructions or automatic via the CPUs built in prefetch logic (if it has any). Instruction schedulers generally assume that data is already in the L1 cache on the CPU.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this