Sign in to follow this  
Programmer16

[solved]asm problem (C++)

Recommended Posts

I got a code snippet out of "Tricks of the 3D Game Programming Gurus: Advanced 3D Graphics and Rasterization". It replaces memset(), but when I use it I access violations (I believe that I'm using it incorrectly). Its supposed to clear by QUADs instead of BYTEs (or something like that.) Here are the snippet and a second one that I made:
inline void SetMemory(void* pDest, UINT nData, int iSize)
{
	_asm
	{
		mov edi, pDest;
		mov ecx, iSize;
		mov eax, nData;
		rep stosd;
	}
}

inline void ClearMemory(void* pDest, int iSize)
{
	SetMemory(pDest, 0, iSize);
}


I then tested it with
WNDCLASS WndClass;
dftCommon::ClearMemory(&WndClass, sizeof(WNDCLASS));
Which clears the memory perfectly (just like ZeroMemory(), but after the app returns from main(), I get a large amount of access violations. After switching to ZeroMemory(), they go away. Any ideas? Thanks! [Edited by - Programmer16 on July 4, 2005 1:14:40 PM]

Share this post


Link to post
Share on other sites
Pardon my ignorance of asm, but what exactly is the difference? I suppose the brackets do something, but what is it that they do?

Share this post


Link to post
Share on other sites
As far as I remember, you need brackets around the variable names in assembler because the variable names WITHOUT brackets correspond to their address, not to the variables themselves. Don't know if it is the same in inline assembler.
Also, sizeof gives you the number of bytes, wheras you are trying to write that much dwords. Perhaps that is the error.

Share this post


Link to post
Share on other sites
is iSize in Bytes, or Words? if it is length in bytes try dividing it be 4 before clearing it, or use stosb instead (though stosd will be faster)

Share this post


Link to post
Share on other sites
I'm not sure but I think you should give the size in quad words. You use sizeof which returns the number of bytes so for example if WNDCLASS is 32 bytes it'll clear 32*4(128) bytes and therefore you get an access violation (access 96 bytes beyond the object).

Share this post


Link to post
Share on other sites
im not that capable with x86 these days, but have you tried saving the ecx,edi,asx registers on the stack before using them and then restoring them before you return?

Cheers
-Danu

Share this post


Link to post
Share on other sites
Yes, that fixed it. Thank you good people!

PS: In the appendix of the book, he defines a function like so:

_asm
{
mov edi, x;
mov ecx, 1000/4;
mov eax, 0;
rep stosd;
}



But when I tried to '/ 4', I get an error. Is this just a typo in the book?

The error is:
c:\documents and settings\programmer16\my documents\projects\dragonforge
technology\dftcommon.h(35) : error C2425: '/' : non-constant expression in
'second operand'


Again, thanks!

Share this post


Link to post
Share on other sites
If you're getting a compiler error complaining about that assembly statement, with 1000/4 in it, then something's amiss. Did you say iSize/4? If so, then yes, you will get an error. However, you can accomplish this with:

mov ecx, iSize
shr ecx, 2


As for the compiler saving registers, eax, ecx, and edx are caller saved, which means you can use them as you please, but remember to save them (the stack is the standard place) before you call any other functions if you want to be guaranteed to get your old value back. However, edi is a callee save register. I'd push it before changing the value and then pop it before you return.

Share this post


Link to post
Share on other sites
Well, here's the finished function. If anybody has any corrections, please say so (I don't want to go on using something thats not working right.)

inline void SetMemory(void* pDest, UINT nData, int iSize)
{
_asm
{
mov edi, pDest;
mov ecx, iSize;
shr ecx, 2;
mov eax, nData;
rep stosd;
}
}


Thanks for everybodie's help, it seems to be working right.

Share this post


Link to post
Share on other sites
Maybe i am missing something, but why are you replacing memset with your own implementation?

If you replace it with anything, it should be with std::uninitialized_fill or std::fill (or the postfixed _n equivalents).

Share this post


Link to post
Share on other sites
Quote:
Original post by Programmer16
Thanks for everybodie's help, it seems to be working right.


and It will continue with that until you try to set a block that's not a multiple of four bytes long then the last size & 3 bytes will end up having their old values

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Depending on the compiler you might want to consider enabling intrinsics if they are supported. With intrinsics enabled the compiler will replace calls such as strcpy, memcpy, memset, etc. with inline versions using 'rep??' CPU instructions. This is more beneficial because the compiler will handle target CPU specifics including 32bit vs 64bit register usage and usage of SSE and MMX instructions for certain tasks.

For MS compilers check the MSDN for the appropriate #pragma declarations.

Share this post


Link to post
Share on other sites
trust me, just use memset. i doubt there's a compiler out there that will do it slower unless the size is a variable and you do NOTHING but memset. in that case the compiler has to add a couple instructions and you won't notice.

edit: that's the kind of stuff i used to do nearly a decade ago and we don't need to anymore :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Jingo
Maybe i am missing something, but why are you replacing memset with your own implementation?

If you replace it with anything, it should be with std::uninitialized_fill or std::fill (or the postfixed _n equivalents).


Quoted for truth. If this kind of trickery is useful on your system, then you can count on std::fill to do it for you automatically when possible (and default to something else if it isn't for some reason, through the magic of templates).

Share this post


Link to post
Share on other sites
Quote:
and It will continue with that until you try to set a block that's not a multiple of four bytes long then the last size & 3 bytes will end up having their old values


Shouldn't my compiler pad it upto 4 bits (if its 13, would it go upto 16)? I was told at a different forum that is was a lot better since it clears QUADwords.

If I shouldn't use it I won't, but I just thought that it'd be better.

Thanks guys!

Share this post


Link to post
Share on other sites
You're probably better off without writing a custom memset() with inline assembly. It's non-portable and can potentially be buggy if things aren't aligned properly. If you're using a Microsoft compiler you're probably better off enabling intrinsics (/Oi), which will (most likely) replace the memset() function with the fastest possible code. This is much more portable and less bug-prone.

EDIT: AP beat me to it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this