Back to General and Gameplay Programming

How to understand this code section

General and Gameplay Programming Programming

Started by xinsan June 19, 2008 10:55 AM

9 comments, last by xinsan 15 years, 10 months ago

xinsan

100

Author

June 19, 2008 10:55 AM

The following code optimize ddot operation in BLAS library. Without any optimization, the DDOT function is: #define DDOT(A,n,x,y,rd) { int _n_=n; double *_x_=x,*_y_=y; rd=0; while(_n_--) rd+=*(_y_++) * *(_x_++); } With SSE2 instruction, DDOT function is written as follows, among which I don't understand "andl $0x07,%%edx" "je jj1"#X"" The full code is: #define DDOT_SSE2(X,nv,xv,yv,res) asm volatile ( " xorps %%xmm7, %%xmm7 \n\t" " mov %%ecx,%%edx /* %%ecx = %%edx = n */ \n\t" " movapd %%xmm7,%%xmm0 \n\t" " movapd %%xmm7,%%xmm1 \n\t" " movapd %%xmm7,%%xmm2 \n\t" " movapd %%xmm7,%%xmm3 \n\t" " movapd %%xmm7,%%xmm4 \n\t" " movapd %%xmm7,%%xmm5 \n\t" " movapd %%xmm7,%%xmm6 \n\t" " shrl $1,%%edx \n\t" " andl $0x07,%%edx \n\t" " je jj1"#X" \n\t" " movapd (%%eax),%%xmm6 \n\t" " mulpd (%%ebx),%%xmm6 \n\t" " cmp $3,%%edx \n\t" " jg jjg3"#X" \n\t" " jne jjg9"#X" \n\t" " movapd 0x10(%%eax),%%xmm5 \n\t" " mulpd 0x10(%%ebx),%%xmm5 \n\t" " movapd 0x20(%%eax),%%xmm4 \n\t" " mulpd 0x20(%%ebx),%%xmm4 \n\t" " jmp jj0"#X" \n\t" " jjg9"#X": cmp $2,%%edx \n\t" " jl jj0"#X" \n\t" " movapd 0x10(%%eax),%%xmm5 \n\t" " mulpd 0x10(%%ebx),%%xmm5 \n\t" " jmp jj0"#X" \n\t" " jjg3"#X": movapd 0x30(%%eax),%%xmm3 \n\t" " movapd 0x20(%%eax),%%xmm4 \n\t" " mulpd 0x30(%%ebx),%%xmm3 \n\t" " movapd 0x10(%%eax),%%xmm5 \n\t" " mulpd 0x20(%%ebx),%%xmm4 \n\t" " mulpd 0x10(%%ebx),%%xmm5 \n\t" " cmp $5,%%edx \n\t" " jg jjg5"#X" \n\t" " jne jj0"#X" \n\t" " movapd 0x40(%%eax),%%xmm2 \n\t" " mulpd 0x40(%%ebx),%%xmm2 \n\t" " jmp jj0"#X" \n\t" " jjg5"#X": movapd 0x50(%%eax),%%xmm1 \n\t" " mulpd 0x50(%%ebx),%%xmm1 \n\t" " movapd 0x40(%%eax),%%xmm2 \n\t" " mulpd 0x40(%%ebx),%%xmm2 \n\t" " addpd %%xmm1,%%xmm3 \n\t" " cmp $6,%%edx \n\t" " je jj0"#X" \n\t" " movapd 0x60(%%eax),%%xmm0 \n\t" " mulpd 0x60(%%ebx),%%xmm0 \n\t" " addpd %%xmm0,%%xmm2 \n\t" " jj0"#X": shll $4,%%edx \n\t" " addl %%edx,%%ebx \n\t" " addl %%edx,%%eax \n\t" " jj1"#X": mov %%ecx,%%edx \n\t" " shrl $4,%%ecx \n\t" " je jip3"#X" \n\t" " movapd (%%eax),%%xmm0 \n\t" " movapd 0x10(%%eax),%%xmm1 \n\t" " .p2align 4 /* each loop does 16 add+multiply */ \n\t" " jip2"#X": mulpd (%%ebx),%%xmm0 \n\t" " addpd %%xmm2,%%xmm6 \n\t" " movapd 0x20(%%eax),%%xmm2 \n\t" " mulpd 0x10(%%ebx),%%xmm1 \n\t" " addpd %%xmm3,%%xmm7 \n\t" " movapd 0x30(%%eax),%%xmm3 \n\t" " mulpd 0x20(%%ebx),%%xmm2 \n\t" " addpd %%xmm0,%%xmm4 \n\t" " movapd 0x40(%%eax),%%xmm0 \n\t" " mulpd 0x30(%%ebx),%%xmm3 \n\t" " addpd %%xmm1,%%xmm5 \n\t" " movapd 0x50(%%eax),%%xmm1 \n\t" " mulpd 0x40(%%ebx),%%xmm0 \n\t" " addpd %%xmm2,%%xmm6 \n\t" " movapd 0x60(%%eax),%%xmm2 \n\t" " mulpd 0x50(%%ebx),%%xmm1 \n\t" " addpd %%xmm3,%%xmm7 \n\t" " movapd 0x70(%%eax),%%xmm3 \n\t" " mulpd 0x60(%%ebx),%%xmm2 \n\t" " addpd %%xmm0,%%xmm4 \n\t" " movapd 0x80(%%eax),%%xmm0 \n\t" " mulpd 0x70(%%ebx),%%xmm3 \n\t" " addpd %%xmm1,%%xmm4 \n\t" " movapd 0x90(%%eax),%%xmm1 \n\t" " add $0x80,%%ebx \n\t" " add $0x80,%%eax \n\t" " dec %%ecx \n\t" " jne jip2"#X" \n\t" " jip3"#X": addpd %%xmm2,%%xmm6 \n\t" " addpd %%xmm3,%%xmm7 \n\t" " addpd %%xmm5,%%xmm4 \n\t" " addpd %%xmm6,%%xmm7 \n\t" " addpd %%xmm7,%%xmm4 \n\t" " movapd %%xmm4,%%xmm0 \n\t" " shufpd $1,%%xmm4,%%xmm4 \n\t" " addsd %%xmm4,%%xmm0 \n\t" " andl $1,%%edx \n\t" " je jip4"#X" \n\t" " movsd (%%eax),%%xmm1 \n\t" " mulsd (%%ebx),%%xmm1 \n\t" " addsd %%xmm1,%%xmm0 \n\t" " jip4"#X": movsd %%xmm0,(%0) \n\t" : : "g" (&res), "a" (xv), "b" (yv), "c" (nv) : "xmm0","xmm1","xmm2","xmm3","xmm4","xmm5","xmm6","xmm7","edx","cc" ) [Edited by - xinsan on June 19, 2008 9:48:23 PM]

dave

2,188

June 19, 2008 10:57 AM

Is that a virus?

xinsan

100

Author

June 19, 2008 11:07 AM

Quote:Original post by Dave
Is that a virus?

No. It is abstract from miniSSEL1BLAS.hpp in miniSSEL1BLAS library.

http://iridia.ulb.ac.be/~fvandenb/miniSSEL1BLAS/miniSSEL1BLAS.html

miniSSEL1BLAS optimize L1 BLAS library such as DDOT using inline assembly.
The above assembly is DDOT operation. I don't understand it.

Promit

13,404

June 19, 2008 11:10 AM

So what is your actual question?

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

RDragon1

1,205

June 19, 2008 11:11 AM

Why do you have so many jumps in vector code? Awful.

Wyrframe

2,489

June 19, 2008 12:11 PM

His question appears to be...

I don't understand
andl $0x07,%%edx
and
je jj1"#X"

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

Bregma

9,461

June 19, 2008 12:20 PM

Quote:Original post by xinsan
I don't understand "andl $0x07,%%edx" "je jj1"#X""

The first instruction tests the 3 least-significant bits of the content of register edx. the second instruction will jump to the label jj1 (with some postfix string to make it global-unique -- the label is found a little further down in the listing) if those three least significant bits were not all set (that is, the result of the bitwise AND was all-zeros).

Does that help?

Stephen M. Webb
Professional Free Software Developer

fpsgamer

856

June 19, 2008 12:25 PM

You should give this thread a better title so that people who actually know about this topic are inclined to read it.

xinsan

100

Author

June 19, 2008 09:49 PM

Quote:Original post by Promit
So what is your actual question?

My actual question is I don't understand these two instructions.
"andl $0x07,%%edx \n\t"
" je jj1"#X" \n\t"

alvaro

21,604

June 20, 2008 07:51 AM

Quote:Original post by xinsan
Quote:Original post by Promit
So what is your actual question?

My actual question is I don't understand these two instructions.
"andl $0x07,%%edx \n\t"
" je jj1"#X" \n\t"

The first instruction zeroes out all the bits in edx except for the last three (see Wikipedia for details).

The second instruction will jump to a label which is "jj1" concatenated with X (the first macro parameter) if the result of the previous operation was 0 (that is, if the last three bits of edx were 0).

How to understand this code section

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How to understand this code section

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines