Sign in to follow this  
Snatch

__allshr and bitboard

Recommended Posts

I am currently rewritting my chess engine and I use bitboard in order to have a quite fast move generation. I have just profile my code and it appears that 40% of the code is spend in __allshr, is this just because of the right shifts ? I use Visualt Studio 2005 (Beta 2) and it generates nearly 16 000 000 moves / sec which is not so bad, but nothing compare to the 40 000 000 moves / sec of GNU Chess. Is there something to do with theses __allshr ?

Share this post


Link to post
Share on other sites
Here is an assembler listing for __allshr. The instructions match a disassembly of the same function exported by ntoskrnl.exe on XP. I doubt your code imports the function from that same place, I just mention that to say that the code in the link is an accurate reflection of the function.

At any rate, I don't think a faster version of that function can be crafted. If you think that's the bottleneck, you might want to reorganize your code to be less reliant on that function. Maybe you don't need to rely on right shifts so much?

Share this post


Link to post
Share on other sites
Do you really need an arithmetic (signed) right shift for a bitboard? A simple unsigned shift should be much faster.
Quote:
Original post by LessBread
At any rate, I don't think a faster version of that function can be crafted.
That may be true for the general case but does the OP really need to cover all three cases in the code?
If all that's required is a variable shift between 0-31 steps then two (albeit slow) instructions would be enough.

Share this post


Link to post
Share on other sites
I don't know enough about the code to say whether the general case applies or not. If it doesn't, then there's the code to use to craft a specialized version of the function that doesn't contend with all three cases.

Share this post


Link to post
Share on other sites
Quote:
Original post by doynax
Do you really need an arithmetic (signed) right shift for a bitboard? A simple unsigned shift should be much faster.


The disassembly of aullshr isn't very different from allshr.



;********************************************************************************
; _allshr (1404)
0x4026AC: 80F940 CMP CL,0x40
0x4026AF: 7316 JAE 0x4026C7
0x4026B1: 80F920 CMP CL,0x20
0x4026B4: 7306 JAE 0x4026BC
0x4026B6: 0FADD0 SHRD EAX,EDX,CL
0x4026B9: D3FA SAR EDX,CL
0x4026BB: C3 RET
;********************************************************************************
0x4026BC: 8BC2 MOV EAX,EDX ; <==0x004026B4(*-0x8)
0x4026BE: C1FA1F SAR EDX,0x1F
0x4026C1: 80E11F AND CL,0x1F
0x4026C4: D3F8 SAR EAX,CL
0x4026C6: C3 RET
;********************************************************************************
0x4026C7: C1FA1F SAR EDX,0x1F ; <==0x004026AF(*-0x18)
0x4026CA: 8BC2 MOV EAX,EDX
0x4026CC: C3 RET
;********************************************************************************

;********************************************************************************
; _aullshr (1408)
0x40283F: 80F940 CMP CL,0x40
0x402842: 7315 JAE 0x402859
0x402844: 80F920 CMP CL,0x20
0x402847: 7306 JAE 0x40284F
0x402849: 0FADD0 SHRD EAX,EDX,CL
0x40284C: D3EA SHR EDX,CL
0x40284E: C3 RET
;********************************************************************************
0x40284F: 8BC2 MOV EAX,EDX ; <==0x00402847(*-0x8)
0x402851: 33D2 XOR EDX,EDX
0x402853: 80E11F AND CL,0x1F
0x402856: D3E8 SHR EAX,CL
0x402858: C3 RET
;********************************************************************************
0x402859: 33C0 XOR EAX,EAX ; <==0x00402842(*-0x17)
0x40285B: 33D2 XOR EDX,EDX
0x40285D: C3 RET
;********************************************************************************



Share this post


Link to post
Share on other sites
Quote:
Original post by LessBread
Quote:
Original post by doynax
Do you really need an arithmetic (signed) right shift for a bitboard? A simple unsigned shift should be much faster.

The disassembly of aullshr isn't very different from allshr.
Too bad.. I suppose you could get rid of the >64 case at least.
The MMX instruction set as a PSRLQ instrunction for logical 64-bit right shifts.
But just getting an inlined version should help a lot (no need to preserve any registers, among other things).

edit: If an unsigned shift works then the new bits shifted in obviously doesn't matter. So a simple SHRD for the lower word and SHR for the higher should work too.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this