• ### Popular Now

• 9
• 10
• 9
• 11
• 14

#### Archived

This topic is now archived and is closed to further replies.

# optimization

This topic is 5186 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello, I''ve read a lot about optimizations and especially about pairing. So that''s why I''ve written a little testcode to test the real effort of this method: With pairing:
	cli
mov ecx,1000000000

lp:
mov eax,5
sub edx,100
inc eax
dec ebx

mov edx,5
sub ebx,100
inc edx
dec eax

mov ebx,5
sub eax,100
inc ebx
dec edx

dec ecx
jnz lp

sti


I''ve also tried this loop with every instruction working on the same register:
	        mov eax,5
sub eax,100
inc eax
dec eax

In theory you can expect that the instructions can ALL be paired so that the optimized code would be a lot faster. In my test program, that reads the time with the rdtsc instruction, the performance gain is only 3%. The test machine is a Celeron 433 with Win98. Why is the code not as much optimalized as would be expected?

##### Share on other sites
> with Win98
> cli
heh, Win9x is useful in some respects

Try getting rid of the movs to eax, edx, and ebx. You are effectively cutting short your ''dependency chains'' by overwriting the register''s values. To understand these effects, you need to get out of the mindset that the CPU just executes instructions as they come, with fixed latency per op. Usage of the word pairing indicates you''ve been reading Pentium optimization dox - those are long out of date

Once the processor ''sees'' the mov, it is allowed to discard all previous writes to this reg, as long as no other instructions depend on them. I have not dealt with PIIs in depth, so I don''t know how large the window is, nor how clever its data flow analysis is.

BTW, registers aren''t ''fixed'' - there are many more hidden registers that are transparently renamed. Example: add ebx, 1 translates to: 1) take value of architectural reg ebx (more on that later) 2) add 1 3) put it in another temp register ''x'' next clock 4) update mapping of arch regs - the current value of ''ebx'' is in ''x''.

Have a look at the code with your favorite profiler (not just some cheap timer thingy - something that does pipeline simulation), that should clear things up

    .align 16lp:    mov eax, 5    ...    dec ecx    jnz lp