Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

quasar3d

optimization

This topic is 5276 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I''ve read a lot about optimizations and especially about pairing. So that''s why I''ve written a little testcode to test the real effort of this method: With pairing:
	cli
		mov ecx,1000000000

	lp:
		mov eax,5
		add ebx,1
		sub edx,100
		inc eax
		dec ebx

		mov edx,5
		add eax,1
		sub ebx,100
		inc edx
		dec eax

		mov ebx,5
		add edx,1
		sub eax,100
		inc ebx
		dec edx

		dec ecx
		jnz lp


		sti

I''ve also tried this loop with every instruction working on the same register:
	        mov eax,5
		add eax,1
		sub eax,100
		inc eax
		dec eax
In theory you can expect that the instructions can ALL be paired so that the optimized code would be a lot faster. In my test program, that reads the time with the rdtsc instruction, the performance gain is only 3%. The test machine is a Celeron 433 with Win98. Why is the code not as much optimalized as would be expected?

Share this post


Link to post
Share on other sites
Advertisement
> with Win98
> cli
heh, Win9x is useful in some respects

Try getting rid of the movs to eax, edx, and ebx. You are effectively cutting short your ''dependency chains'' by overwriting the register''s values. To understand these effects, you need to get out of the mindset that the CPU just executes instructions as they come, with fixed latency per op. Usage of the word pairing indicates you''ve been reading Pentium optimization dox - those are long out of date

Once the processor ''sees'' the mov, it is allowed to discard all previous writes to this reg, as long as no other instructions depend on them. I have not dealt with PIIs in depth, so I don''t know how large the window is, nor how clever its data flow analysis is.

BTW, registers aren''t ''fixed'' - there are many more hidden registers that are transparently renamed. Example: add ebx, 1 translates to: 1) take value of architectural reg ebx (more on that later) 2) add 1 3) put it in another temp register ''x'' next clock 4) update mapping of arch regs - the current value of ''ebx'' is in ''x''.

Have a look at the code with your favorite profiler (not just some cheap timer thingy - something that does pipeline simulation), that should clear things up

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Try aligning the loop label to a 16 byte boundary - this can make a difference on some machines.

.align 16
lp:
mov eax, 5
...
dec ecx
jnz lp

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!