Jump to content
  • Advertisement
Sign in to follow this  
dxCUDA

OpenGL OpenGL ASM Experiment

This topic is 2264 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Experimenting with learning x64 asm and how code operates on a low level, and trying to work out where speed is increased in projects when ASM is appropriately used.

My project is x64 with MASM in visual studio 2010

Currently I am trying to replace the basic glClear glLoadIdentity etc with asm to understand how to interface between code and asm.

I'm following the asm examples over at NEHE, but I constantly keep getting an access violation error when I seem to introduce 'push' into the code.

This is the code asm-side:

include gl.inc
include glu.inc

.data
_45d0 equ 40468000h ;45.0
_45d1 equ 0
_01d0 equ 1069128089
_01d1 equ -1717986918 ;0.1
_100d0 equ 1079574528
_100d1 equ 0 ;100.0
_1d0 equ 1072693248
_1d1 equ 0 ;1.0
_05 equ 1056964608 ; 0.5
_1 equ 1065353216 ; 1.0
_m1 equ -1082130432 ;-1.0
_3 equ 1077936128 ; 3.0
_m15 equ -1077936128 ;-1.5
_m6 equ -1061158912 ;-6.0

.code

ASMrender proc
display:

push GL_COLOR_BUFFER_BIT
call glClear
call glLoadIdentity
call glEnd

push _m15
push 0
push _m6
call glTranslatef

xor eax,eax

ret

ASMrender endp
end



Over in C++ I am just doing the standard 'extern "C" void ASMrender
and then calling ASMrender() in the main render loop.

Here's a screenshot of some of the action where it all goes wrong.
2db1fll.png

Share this post


Link to post
Share on other sites
Advertisement
Replacing OpenGL calls from C with OpenGL calls from assembler will gain you nothing in terms of performance.

I believe 64-bit windows calling convention for first four floating point arguments requires them to go in SSE2 registers, not on stack: http://msdn.microsof...y/zthk2dkh.aspx
Same thing for integer argument - you should pass argument to glClear on RCX register, not on stack.
You should better write code in C, and check the disassembler how it looks (by putting breakpoint on function call and selecting Dissassembly view from Debug menu - Alt+8). You'll see that Visual Studio will generate pretty optimal code in this case.

Also - you can not call glEnd without glBegin. Edited by Martins Mozeiko

Share this post


Link to post
Share on other sites
Also, from http://msdn.microsof...y/ms235286.aspx:

The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
[/quote]

So something like this:

mov rcx, GL_COLOR_BUFFER_BIT ; parameter
sub rsp, 32 ; shadow space for 4 registers
call glClear
add rsp, 32 ; pop register shadows
Edited by Erik Rufelt

Share this post


Link to post
Share on other sites
Thanks Rufelt,

I realize this is a somewhat pointless effort in terms of optimizations and yadda yadda. I'd rather just learn how asm works and once I am familiar, in the future I can focus on some decent optimizations with SSE(or so I've heard)

So the asm now looks something like this, however the screen is black, no white triangle:

sub rsp, 32h
mov ecx, GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT
call glClear
add rsp, 32h

call glLoadIdentity

sub rsp, 20h
mov rdx, _m15
mov rcx, 0
mov r8d, _m6
call glTranslatef
add rsp, 20h

sub rsp, 18h
mov edx, GL_TRIANGLES
call glBegin
add rsp, 18h

sub rsp, 20h
mov edx, 0
mov ecx, _1
mov r8d, 0
call glVertex3f
add rsp, 20h

sub rsp, 20h
mov edx, _m1
mov ecx, _m1
mov r8d, 0
call glVertex3f
add rsp, 20h

sub rsp, 20h
mov edx, _1
mov ecx, _m1
mov r8d, 0
call glVertex3f
add rsp, 20h

call glEnd

ret
Edited by dxCUDA

Share this post


Link to post
Share on other sites
Cheers for that, Still doesn't seem to be working. All I get is a black screen. I have a feeling it might be the data from _m1, _m15 etc. I'll keep digging

Share this post


Link to post
Share on other sites
Well I check the value of _m15 which equated to 0BFC00000h so I created _neg15 dd -1.5f to be more clear on the value and this was also 0BFC00000h .

I'm entirely certain that everything is working accordingly, as if I just chose to contain glClear and glLoadIdentity within the asm function, in conjunction with triangle drawing C++ side, and it works. So it must be incorrect data parsed in asm to the gl procedures.

Share this post


Link to post
Share on other sites
glTranslatef arguments are floats. First 4 float arguments of function are passed in SSE2 registers (not rdx/rcx/r8d) as described in the link I and Erik posted above.

And if all you want is to optimize using SSE instructions, then there is no need to do assembly. You simply can use intrinsic functions. It will greatly simplify your life and will give compiler more chance to optimize the code better (inlining & other stuff). Also advantage will be that same code will work for 32-bit target - no need to write assembly twice (for 32 and 64-bit).
http://msdn.microsof...y/y0dh78ez.aspx

And I'll repeat myself. To easier spot mistake in your assembly for such simple code - write the same code in C, and inspect generated assembly (press Alt+8 while debugging) and examine generated assembly code to see the differences from your written assembly. Edited by Martins Mozeiko

Share this post


Link to post
Share on other sites
Thanks Mozeiko

That's a good direction to take after this learning exercise. I didn't quite understand from the other post initially about the SEE2 registers, but I do now and it works..

Would asm be viable in a runtime situation whereby we have an if -else statement, we could then do this in asm and avoid the doubled call each frame, as it would remove half the calling?. I could see it being better than C++ in such a situation, or would I still be wrong?

Share this post


Link to post
Share on other sites
It works now, thanks for the help, even though the triangle is a little bit weird. Here's the code for any future searches:


include gl.inc
include glu.inc


.data

_neg15 dd -1.5f ;
_neg6 dd -6.0f
_pos1 dd 1.0f
_neg1 dd -1.0f
_20 dd 20.0f


.code

ASMrender proc


sub rsp, 32h
mov ecx, GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT
call glClear
add rsp, 32h

call glLoadIdentity

sub rsp, 32h
movss xmm2, dword ptr [_neg15]
xorps xmm1,xmm1
movss xmm0, dword ptr [_neg1]
call glTranslatef
add rsp, 32h

sub rsp, 32h
mov ecx, GL_TRIANGLES
call glBegin
add rsp, 32h


sub rsp, 32h
xorps xmm2,xmm2
movss xmm1, dword ptr [_pos1]
xorps xmm0,xmm0
call glVertex3f
add rsp, 32h

sub rsp, 32h
movss xmm2, dword ptr [_neg1]
movss xmm1, dword ptr [_neg1]
xorps xmm0,xmm0
call glVertex3f
add rsp, 32h


sub rsp, 32h
movss xmm2, dword ptr [_pos1]
movss xmm1, dword ptr [_neg1]
xorps xmm0,xmm0
call glVertex3f
add rsp, 32h

call glEnd


ret

ASMrender endp
end
Edited by dxCUDA

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!