Archived

This topic is now archived and is closed to further replies.

drarem

faster way to copy...

Recommended Posts

Below is the function I am working with.. a simple strcpy(). It uses pointer math and qword.. I don''t worry about alignment because the basic rule of the mscvrt lib is the strings need to be ''0'' terminated.. or do I need to worry bout that? And is there a faster way, does it need to be asm? Using array math ie ptrout[retg++]=ptrin[retg] is slightly slower, probably due to use of the stack.
  int StrCpyA(void *OUTDATA, void *INDATA) {
    unsigned long long *ptrout;
    unsigned long long *ptrin;

    ptrout = (unsigned long long *) OUTDATA;
    ptrin = (unsigned long long *) INDATA;
    
//    ptrout[0] = ptrin[0];

    while (*ptrin) *ptrout++ = *ptrin++;
    
}
  
I fseek, therefore I fam.

Share this post


Link to post
Share on other sites
just out of curiosity (no speed difference) why do this?

unsigned long long *ptrout;
unsigned long long *ptrin;
ptrout = (unsigned long long *) OUTDATA;
ptrin = (unsigned long long *) INDATA;

instead of this:

unsigned long long *ptrout = (unsigned long long *) OUTDATA;
unsigned long long *ptrin = (unsigned long long *) INDATA;

?

Share this post


Link to post
Share on other sites
I didn''t think of that, but I am trying to keep the stack out of this, that is why it is so ''open'', but..

I just tried that and it seemed to run negligbly slower according to the timer results on four test runs, so I put it back..



I fseek, therefore I fam.

Share this post


Link to post
Share on other sites
you do have to take care of alignment issues, and as it stands now, your code is incorrect. before trying to optimize strcpy, i suggest checking if your favorite compiler can generate intrinsic rep movs code for it - msvc for example does so since version 6, if not before, producing code that''s both fast and correct.

Share this post


Link to post
Share on other sites
intrinsic, you mean such as:

__rep__movs; ?

or inline assembly.. what does the syntax look like, kind of inlined stuff? I searched the include lib and found no rep or movs.. If the prior, I don''t think mingw/gcc supports it.

To align, I could either call strlen(), divide up the qwords and bytes and copy that way, or

check for *ptrin until null, then go to the byte level.

I fseek, therefore I fam.

Share this post


Link to post
Share on other sites
Here''s my new StrCpy(), it works, what the heck is going on here?
ptrout[0] = ptrin[0]... my pointer knowledge sucks..



  int StrCpyA(void *OUTDATA, void *INDATA) {
unsigned long long *ptrout;
unsigned long long *ptrin;

ptrout = (unsigned long long *) OUTDATA;
ptrin = (unsigned long long *) INDATA;

ptrout[0] = ptrin[0];

}


I fseek, therefore I fam.

Share this post


Link to post
Share on other sites
Otay I was on drugs, sorry ~:\ I didn''t examine my test code to see I copied the string back in.. so ptrout[0]=ptrin[0] does absolutely nothing.. desirable



I fseek, therefore I fam.

Share this post


Link to post
Share on other sites
One more time, I think this will do it.. 32% faster than the mscvrt version, two questions:

1) Can it be optimized more?

2) Why do I have to subtract 16 from the stringlength? The regular strlen() returns the same number of chars..

Not to go on about this, but how does strlen be optimized via DWORD or QWORD and not go over without knowing its length to begin with.. I guess reading a bad address is ok but writing to it is not.. such as:

while (*ptr++) slen++; where ptr is counting in qwords.. OK? or NOK?

ok so I answered my own question kinda, POINTING to a null address is ok but WRITING to it is not, am I on the right track?


        int StrCpyA(void *OUTDATA, void *INDATA) {
unsigned long long *ptrout;
unsigned long long *ptrin;
unsigned char * ptroutc;
unsigned char * ptrinc;
unsigned int ret=StrLenA(INDATA)-16;
unsigned int cnt=0;

ptrout = (unsigned long long *) OUTDATA;
ptrin = (unsigned long long *) INDATA;

ptroutc = (unsigned char *) OUTDATA;
ptrinc = (unsigned char *) INDATA;

for (; cnt < ret; cnt += sizeof(unsigned long long)) *ptrout++ = *ptrin++;
for (; ptrinc[cnt]; cnt++) ptroutc[cnt] = *(ptrinc + cnt);
}


I fseek, therefore I fam.

[edited by - drarem on May 3, 2003 5:19:02 AM]

[edited by - drarem on May 3, 2003 5:19:50 AM]

[edited by - drarem on May 3, 2003 5:21:44 AM]

Share this post


Link to post
Share on other sites
I'm sorry, but that code doesn't work - and there's something seriously wrong with your testing if you have found it to be faster than the MSVC or GCC ones with full optimization.

An intrinsic function is basically a function the compiler knows about and is able to optimize using low-level knowledge about the operation performed. You can think of it as an inline assembly function that gets optimized by the compiler for every call. Any modern compiler will have an intrinsic for strcpy and you are not going to beat the compiler-generated code for copying short strings unless you have very specific knowledge about the strings being copied (ie "all my strings are 16 bytes long and properly aligned").

[edited by - spock on May 3, 2003 6:34:36 AM]

Share this post


Link to post
Share on other sites