|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic |
Last Thread Next Thread ![]() |
| x86 assembly language question |
|
![]() i1977 Member since: 5/2/2002 From: Canada |
||||
|
|
||||
| Assembly language is a bit far in my head and I can't remember how to copy a 32 bit value multiple times in an array. I vaguely remember that there is a specific instruction for that where you have to put the count in cx (I think) the destination address in some other register and then call some instruction to do the work. Can someone please refresh my memory? Ex: const int nTimesToCopy = 512; const DWORD dwValue = 0xFF00FF00; __asm { // what goes here? } |
||||
|
||||
![]() Ra Member since: 11/9/2002 From: NY, United States |
||||
|
|
||||
| I believe rep stosd is what you're looking for. It'll take whatever value is in eax and put it at es:edi, add 4 to edi, and repeat that ecx times. |
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
| Well, first make sure that EDI points to the destination array, load ECX with the length of the array, then load EAX with the value to load into the array. then just REP STOSD |
||||
|
||||
![]() Nuget5555 Member since: 6/15/2004 From: San Diego, CA, United States |
||||
|
|
||||
| rep will work but there is acutally a string copy instruction...but i can't remember it either... sorry :( |
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
| Of course, the real question is: Why not just use memset? |
||||
|
||||
![]() Jx Member since: 11/10/2000 From: United Kingdom |
||||
|
|
||||
| Something along the lines of: const int nTimesToCopy = 512; const DWORD dwValue = 0xFF00FF00; unsigned char array[ nTimesToCopy * sizeof( dwValue )]; __asm { mov EDI, [array] mov ECX, nTimesToCopy mov EAX, dwValue rep stosd } No guarantees that it will work as it's from memory and I think that you may need to set EDI to a DWORD PTR to array, but it gives you the general idea. |
||||
|
||||
![]() i1977 Member since: 5/2/2002 From: Canada |
||||
|
|
||||
| Thanks! I'll try that. Washu, the reason I can't use memset is that even though it takes an int as a parameter, it only copies 1 byte repeatedly, not 4 bytes like what I need to do. |
||||
|
||||
![]() Emmanuel Deloget GDNet News Lead Member since: 8/27/2003 From: France |
||||
|
|
||||
| Hello ! Not sure you'll do better than a C version of your copy algorithm. Compiler generally do very good stuff when dealing with loops. If you still want to go the assembly way, you may want to check the memcpy() code. It is very optimized. Maybe you just need to modify it so it can write 32 bit ints instead of bytes. HTH, -- Emmanuel D. [blog, in French] [blog, very bad googlized translation] [NEW: English version of teh blog! (WIP)] |
||||
|
||||
![]() smr GDNet+ Member since: 7/16/2004 From: Pekin, IL, United States |
||||
|
|
||||
Quote: Are you sure? I was under the impression that it would copy the largest blocks possible until the remaining number of bytes is smaller than said block. Then it would copy any remaining bytes the slow way. |
||||
|
||||
![]() Emmanuel Deloget GDNet News Lead Member since: 8/27/2003 From: France |
||||
|
|
||||
| Hi smr, memset() fills a block of memry with a single byte. Wether it copies 4 bytes in a row or not is simply a matter of optimisations. Since the OP wants to init a block of memory using 4 different bytes. This is completely different, and memset do not allow to do that. Truly, -- Emmanuel D. [blog, in French] [blog, very bad googlized translation] [NEW: English version of teh blog! (WIP)] |
||||
|
||||
![]() Dmytry Member since: 12/9/2003 From: M 104 .... |
||||
|
|
||||
| also,i'd compare performance of rep with performance of loop.... rep might be actually slower :( . |
||||
|
||||
![]() Ra Member since: 11/9/2002 From: NY, United States |
||||
|
|
||||
Quote: Quote:I'm not entirely sure how long a loop (using either a jcc and sub or dec OR loop with simple pairable instructions (mov, sub) or complex instructions (stosd)) would take, but I'm very sure rep stosd is the fastest way to do it in this case. |
||||
|
||||
![]() LessBread Moderator Member since: 12/19/2001 From: Fresno, CA, United States |
||||
|
|
||||
| As the quote Ra provided shows, the registers used with rep movsd are esi and edi. Here's an example that uses this construct to copy arguments to a stack (Source).
DWORD Call_cdecl( const void* args, size_t sz, DWORD func )
{
DWORD rc; // here's our return value...
__asm
{
mov ecx, sz // get size of buffer
mov esi, args // get buffer
sub esp, ecx // allocate stack space
mov edi, esp // start of destination stack frame
shr ecx, 2 // make it dwords
rep movsd // copy params to real stack
call [func] // call the function
mov rc, eax // save the return value
add esp, sz // restore the stack pointer
}
return ( rc );
}
|
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
Then write it in C...
void* memset4(void* t, unsigned int val, size_t count) {
void *dst = t;
while(count--) {
*(unsigned int*)dst = val;
dst = (unsigned int*)dst + 1;
}
return t;
}
and since i just KNOW you're going to say: "Well that's not optimized." Oh? 004014BD B8 34 12 00 00 mov eax,1234h 004014C2 B9 64 00 00 00 mov ecx,64h 004014C7 8D 7C 24 08 lea edi,[esp+8] 004014CB F3 AB rep stos dword ptr [edi] That's what the above code generates. |
||||
|
||||
![]() i1977 Member since: 5/2/2002 From: Canada |
||||
|
|
||||
| Isn't that very compiler specific though? What compiler did you use for this? |
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
| Oh...and i suppose __asm ISN'T? HRM???? I used Visual Studio .Net 2003 Enterprise Architect. If your compiler generates anything but code that is VERY VERY similar to that...it's a piece of shit and you should probably upgrade. The VC++ Toolkit IS free. |
||||
|
||||
![]() CGameProgrammer Member since: 7/30/1999 From: San Diego, CA, United States |
||||
|
|
||||
| I believe memset is "rep stosb" and memcpy is "rep movsb". In other words they are both single-byte only (what the last 'b' stands for). ~CGameProgrammer( ); Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed. |
||||
|
||||
![]() i1977 Member since: 5/2/2002 From: Canada |
||||
|
|
||||
| Washu, When I said "compiler specific", I wasn't refering to Microsoft extensions such as __asm. I know that's not portable and I don't really care since I won't be compiling my code with another compiler anyway. What I meant is that your C function might not compile to the same optimized asm code on another compiler such as gcc, Borland or whatever. I tried compiling your C function and it does indeed produce optimized assembly code for the Release build target. I must admit that I am very surprised about this, but nonetheless, I will use inline assembly code anyway because the code generated for the Debug build target is much less optimized. Oh, and what does "HRM????" mean by the way? |
||||
|
||||
![]() petewood Member since: 3/5/2002 From: Bristol, United Kingdom |
||||
|
|
||||
Quote: Why do you want to optimise your debug build? |
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
Quote: ROFL! You think that the few nano-seconds gained by having an optimized debug build memset is going to help in debug mode? Well, it won't. All debug mode code it unoptimized. So your memset is not going to do anything to give you a speed up. In fact, it will make debugging harder because it won't have certain things built into it that my C function does. |
||||
|
||||
![]() Emmanuel Deloget GDNet News Lead Member since: 8/27/2003 From: France |
||||
|
|
||||
Quote: Hello, this is stolen from VC6 memcpy
CopyUp:
test edi,11b ;U - destination dword aligned?
jnz short CopyLeadUp ;V - if we are not dword aligned already, align
shr ecx,2 ;U - shift down to dword count
and edx,11b ;V - trailing byte count
cmp ecx,8 ;U - test if small enough for unwind copy
jb short CopyUnwindUp ;V - if so, then jump
rep movsd ;N - move all of our dwords
jmp dword ptr TrailUpVec[edx*4] ;N - process trailing bytes
You can find it in ${Vs6Dir}\vc98\crt\intel\memcpy.asm. On a similar fashion, memset also uses "rep stosd". Both functions are heavily optimized for intel processors and takes some neat feature in account - pointer alignement, instruction pairing, and so on. This is why I think that an optimized version of a 4 byte memset should use these as base code. HTH, -- Emmanuel D. [blog, in French] [blog, very bad googlized translation] [NEW: English version of teh blog! (WIP)] |
||||
|
||||
![]() i1977 Member since: 5/2/2002 From: Canada |
||||
|
|
||||
| Thanks for your help guys! Even Washu, who apparently knows more about what I need and am trying to do. ;) |
||||
|
||||
![]() Washu Curiously Tentacled Community Manager Member since: 3/24/2001 From: Kanemitsu |
||||
|
|
||||
Quote: I do, my years of experience tells me that dropping to assembly language for something as simple as a memset4 is a silly idea. Especially when it will cost you in debuggability. (of course, if you don't use the debugger then you have far worse problems.) :) Not to mention: premature optimization is the devil. Profile first. |
||||
|
||||
All times are ET (US)![]() |
Last Thread Next Thread ![]() |
|