Experimenting with learning x64 asm and how code operates on a low level, and trying to work out where speed is increased in projects when ASM is appropriately used.
My project is x64 with MASM in visual studio 2010
Currently I am trying to replace the basic glClear glLoadIdentity etc with asm to understand how to interface between code and asm.
I'm following the asm examples over at NEHE, but I constantly keep getting an access violation error when I seem to introduce 'push' into the code.
This is the code asm-side:
include gl.inc
include glu.inc
Replacing OpenGL calls from C with OpenGL calls from assembler will gain you nothing in terms of performance.
I believe 64-bit windows calling convention for first four floating point arguments requires them to go in SSE2 registers, not on stack: http://msdn.microsof...y/zthk2dkh.aspx
Same thing for integer argument - you should pass argument to glClear on RCX register, not on stack.
You should better write code in C, and check the disassembler how it looks (by putting breakpoint on function call and selecting Dissassembly view from Debug menu - Alt+8). You'll see that Visual Studio will generate pretty optimal code in this case.
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
[/quote]
So something like this:
mov rcx, GL_COLOR_BUFFER_BIT ; parameter
sub rsp, 32 ; shadow space for 4 registers
call glClear
add rsp, 32 ; pop register shadows
I realize this is a somewhat pointless effort in terms of optimizations and yadda yadda. I'd rather just learn how asm works and once I am familiar, in the future I can focus on some decent optimizations with SSE(or so I've heard)
So the asm now looks something like this, however the screen is black, no white triangle:
sub rsp, 32h
mov ecx, GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT
call glClear
add rsp, 32h
Cheers for that, Still doesn't seem to be working. All I get is a black screen. I have a feeling it might be the data from _m1, _m15 etc. I'll keep digging
Well I check the value of _m15 which equated to 0BFC00000h so I created _neg15 dd -1.5f to be more clear on the value and this was also 0BFC00000h .
I'm entirely certain that everything is working accordingly, as if I just chose to contain glClear and glLoadIdentity within the asm function, in conjunction with triangle drawing C++ side, and it works. So it must be incorrect data parsed in asm to the gl procedures.
glTranslatef arguments are floats. First 4 float arguments of function are passed in SSE2 registers (not rdx/rcx/r8d) as described in the link I and Erik posted above.
And if all you want is to optimize using SSE instructions, then there is no need to do assembly. You simply can use intrinsic functions. It will greatly simplify your life and will give compiler more chance to optimize the code better (inlining & other stuff). Also advantage will be that same code will work for 32-bit target - no need to write assembly twice (for 32 and 64-bit). http://msdn.microsof...y/y0dh78ez.aspx
And I'll repeat myself. To easier spot mistake in your assembly for such simple code - write the same code in C, and inspect generated assembly (press Alt+8 while debugging) and examine generated assembly code to see the differences from your written assembly.
That's a good direction to take after this learning exercise. I didn't quite understand from the other post initially about the SEE2 registers, but I do now and it works..
Would asm be viable in a runtime situation whereby we have an if -else statement, we could then do this in asm and avoid the doubled call each frame, as it would remove half the calling?. I could see it being better than C++ in such a situation, or would I still be wrong?