Hello,
I''ve read a lot about optimizations and especially about pairing.
So that''s why I''ve written a little testcode to test the real effort of this method:
With pairing:
cli
mov ecx,1000000000
lp:
mov eax,5
add ebx,1
sub edx,100
inc eax
dec ebx
mov edx,5
add eax,1
sub ebx,100
inc edx
dec eax
mov ebx,5
add edx,1
sub eax,100
inc ebx
dec edx
dec ecx
jnz lp
sti
I''ve also tried this loop with every instruction working on the same register:
mov eax,5
add eax,1
sub eax,100
inc eax
dec eax
In theory you can expect that the instructions can ALL be paired so that the optimized code would be a lot faster.
In my test program, that reads the time with the rdtsc instruction, the performance gain is only 3%.
The test machine is a Celeron 433 with Win98.
Why is the code not as much optimalized as would be expected?