Archived

This topic is now archived and is closed to further replies.

Bruno

fastest way to clean an array

Recommended Posts

Bruno    155
Hi guys Do you guys know a fastest way to clean an bidimensional array like array[200][200], without a "for" loop = 0 ?? I''ve been looking at the function ZeroMemory, but i''m afraid i don''t know how to use it. Thanks Bruno

Share this post


Link to post
Share on other sites
Knarkles    271
Like this:

        
memset(array[0], 0, 40000 * sizeof(array[0]));


Of course, you can replace the sizeof with the size of the data type you are using.

-Jussi

"I want a place to hide, somewhere far from your side"


Edited by - Selkrank on October 6, 2000 6:55:22 AM

Share this post


Link to post
Share on other sites
Ridcully    122
that code you posted shouldn''t even compile.

    
//either this way

memset(array, 0, 40000 * sizeof(array[0]));

//or like this

memset(&array[0], 0, 40000 * sizeof(array[0]));


don''t call me picky

Share this post


Link to post
Share on other sites
Knarkles    271
quote:
Original post by Ridcully

that code you posted shouldn''t even compile.


Yes it should. If the array was of type char, the array''s data type would be char**, and array[0]''s data type would be char*, so array[0] is right.

So, either array[0] or &(array[0][0]).

-Jussi

"What have we done?
Who killed the sun?"

Share this post


Link to post
Share on other sites
Confused    122
You use ZeroMemory like this:

        
ZeroMemory(pointertomemory, lengthtoset);
[/source]

ex.

[source]
int numbers[10];
ZeroMemory(&numbers, sizeof(numbers));


It is just like memset without the second parameter of what to set the memory to (It is actually just a macro for memset).

And for a bidimensional array is would be the same (I'm pretty sure)

-------------
cOnfuSed

Edited by - Confused on October 6, 2000 8:48:53 AM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Fastest way? MMX, baby.

memset() is indeed a very fast function in that it doesn''t copy byte by byte (at least, the Visual C version is). It sets all dword aligned dwords (dword aligned so that two writes are not needed in order to write one dword) and then goes to work on the lead and trailing bytes.

With the FPU you can set 8 bytes at a time instead of 4 but this is only faster if the data is cache-warmed.

You can also set 8 bytes of data at a time with the MMX and this is your target if you use the same alignment strategies of memset().

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Most implementations of memset are just a for-loop and is generally one of the _slowest_ ways to clear memory. The fastest way really depends on what system you''re talking about. Ideally you''d have some sort of vector unit that could write 4 or 8 bytes at a time, or a dma controller that can write constant values to memory. On the N64 it was faster to have the graphics system render a black quad to main memory. I guess for Intel systems, MMX would be best....

Share this post


Link to post
Share on other sites
Ridcully    122
argh!
how could i overlook that?
well, must be that damn headache i''ve got at the moment. probably i shouldn''t visit the board while i am ill.

sorry anyways
rid

Share this post


Link to post
Share on other sites
Mithrandir    607
try this:

memset compiled to ASM is usually just a for loop MOVing 0 into every DWORD.

MOV is an expensive operation. Most ASM guru''s use X xor X to clear a register/memory position.

not sure how well this compiles in VC, but doing a large for loop and xor''ing every element with itself should be pretty fast.



actually, now that I think of it, im not sure how well this performs when accessing memory, because i''ve only used it on registers before, and C doesnt let you touch the registers.

heh... might actually turn out slower.

psuedo ASM:
mov ax, memory
xor ax
mov memory, ax

-or-
move memory,0


oh bah. ignore this entire post.

===============================================
If there is a witness to my little life,
To my tiny throes and struggles,
He sees a fool;
And it is not fine for gods to menace fools.

Share this post


Link to post
Share on other sites
Bruno    155
Thanks for the replys guys..., but

Either with ZeroMemory, or memset, the cleaning is sloooowwww...
I loose around 80 fps when i make the cleaning..
Is there any other way, so that i don''t lose so much performance??
Who can i do this with MMX instructions???

thanks guys

Bruno

Share this post


Link to post
Share on other sites
Tooon    122
Ok, let''s say you already detected MMX.
then :

_asm
{
mov edi,_arr_ptr // load edi width address of arr.
mov ecx,num of bytes / 8 // clears 8 bytes per loop
pxor mm1,mm1 // reset MMX register mm1 to 0

Loop:

movq [edi],mm1
add edi, 8 // inc. ptr to next quadword
dec ecx
jnz Loop
}

Sorry if I have forgotten anything, but hope this helps for a start.

/ Tooon

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Why are you clearing that much memory every frame anyway? I would suggest you redesign your system.

Share this post


Link to post
Share on other sites
mr BiCEPS    140
quote:
Original post by Mithrandir
MOV is an expensive operation. Most ASM guru''s use X xor X to clear a register/memory position.



Hmm... for a 468 it is:

xor mem, immediate value (or mem, register) - 3 clocks.
mov mem, immediate value (or mem, register) - 1 clock.

Doesn''t look faster to me :-)



Share this post


Link to post
Share on other sites
Bruno    155
Well, to understand what i''m doing, download this :
www.geocities.com/brunomtc/test.zip

It''s a c-buffer, and the array, has the pixels that were used.., i still don''t know how to use inline assembler, whatever assembler i put there, i always get an error message

Any of you guys, know how to link assembler made by masm with VC ???

Share this post


Link to post
Share on other sites
goir    122
mr BiCEPS >> I think xor mem,mem is faster than mov mem,0 but when setting mem to zero the xor method can''t be faster. Not in my world anyway

Share this post


Link to post
Share on other sites
mr BiCEPS    140
Wait... xor mem, mem - that isn''t possible at all, is it?
On of the operands got to be either reg or immediate.

At least that''s what my opcodes manual says.

And all mov operations involving a register, immediate value or a memory location only take one clock, so it doesn''t actually get faster than that.

Share this post


Link to post
Share on other sites
Shannon Barber    1681
I'd like to point out 4 things:

as usual, a better method is needed, not better code, but here goes anyway...

you can't xor a memory location, xor is used to clear regs only, and I think it saves you two ticks over mov.

an array[200][200] is not necessarily continous ram, so memset(array, 0x0, 200*200*sizeof(array[0])) is not good idea, unless you've force the array to be continuous.

If you use MMX (128bit blocks right?) the array needs to fall evenly on a 16 byte block, which 200x200 does so I guess your ok. If you use a different sized array, you have stop short and use normal regs for the last few bytes.

        
int x=200;
int y=200;
int** array=0;
array = new int*[x];
for(int i=0;i<y;i++)
array<i> = new int[y];
//thats how array[][] is laid out

//so doing stuff with it requires a for loop


//buUuUuUut you can do this;

int x=200;
int y=200;
int** array=0;

int* temp = new int[x*y];
array = new int*[x];
for(int i=0;i<x;i++)
array<i> = &temp[y*i];
//so you can cheat and use temp to treat the memory space of array linearly



Edited by - Magmai Kai Holmlor on October 9, 2000 11:15:59 PM

Share this post


Link to post
Share on other sites
Bruno    155
Thanks a lot Magmai Kai..

Whatever.., my screen buffer is an array, that is exaclty a copy of the opengl screen, so i can''t clean it with gl commands.., but thanks anyway. :O

Bruno

Share this post


Link to post
Share on other sites
mr BiCEPS    140
quote:
Original post by Magmai Kai Holmlor
you can''t xor a memory location, xor is used to clear regs only, and I think it saves you two ticks over mov.




Great!! This means xor reg, reg takes minus one tick I''m gonna fill my progarms with that, to compensate every single clock it would take otherwise, until the program runs at zero clock cycles...

No, but seriously. They both only take one tick. The diffrence lies in code size. A mov operation on, say a 32 bit register requires a full DWORD to mov in there, 0x00000000, this, plus the actual opcode... Well I guess you see where it''s heading Somewhere around 6 bytes for the instruction....

The xor reg, reg ,however will compile at just two bytes. 33 for the xor, plus one byte to tell the register: Like 33C0 for xor eax, eax

Share this post


Link to post
Share on other sites
Lucasdg    122
Why do you want to clear it? Why not just override it and then peek at it? And as for inline asm in VC:

__asm
{
//ASM GOES HERE!!
} break;

this really should work.



-=[ Lucas ]=-

Share this post


Link to post
Share on other sites
Cyberdrek    100
Biceps: to respond to your question, in the days of TASM( in dos ), xor ax, ax would set ax to 0. AX being a register. of course, that was possible in DOS. In windows assembler, I really have no clue but as far as dos go, it worked...



Cyberdrek
Headhunter Soft
DLC Multimedia

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
quote:
Original post by Bruno

Thanks a lot Magmai Kai..

Whatever.., my screen buffer is an array, that is exaclty a copy of the opengl screen, so i can''t clean it with gl commands.., but thanks anyway. :O

Bruno



Why are you copying the frame buffer to memory every frame? And then clearing it??? You must realize by now that this is an extremely "bad" thing to do (if you care about framerate)... Tell us what you are trying to accomplish and I''m sure we can find you a better way to do it.



Share this post


Link to post
Share on other sites
null_pointer    289
Adding to what Magmai Kai Holmlor said, a bi-directional array is an array of arrays, so you need to loop through the first dimension, and clear the entire single-dimensional array of bytes for the second dimension. That is, this array takes a total of (64 * 4) + (64 * 512) bytes in Windows:


byte mybuffer[64][512];



The first dimension is an array of 64 pointers, each of which are 4-bytes in Windows. The second dimension is a set of 64 arrays that are each 512 bytes in size. sizeof(mybuffer) should return (64 * 4) + (64 * 512). Just in case you were wondering how it works...

Now, for the code using for() and memset():


byte mybuffer[64][512];

for( int x=0; x < 64; x++ )
memset(mybuffer[x], 0, sizeof(byte) * 512);



The for() loop just gets each block of actual data (512 bytes) from the bi-dimensional array, so you could write your own fast_memset() function containing the inline assembly and just replace the call to memset().


Magmai Kai Holmlor:

The original poster could, however, use one large single dimensional array of size 64*512, and use the indexing method:

mybuffer[x + y * width]

to get the element you desire. The whole thing is also contiguous and can be cleared with a single call to memset or a function that you create using inline assembly. I''ve heard that this method is faster. Can you tell me which is faster:

1 integer multiplication, 1 integer addition, 1 pointer addition, and 1 pointer dereference (single dimensional array used as bi-dimensional array)

or

2 pointer additions and 2 pointer dereferences (normal bi-dimensional array)




- null_pointer
Sabre Multimedia

Share this post


Link to post
Share on other sites