Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Ului

faster than memset?

This topic is 5334 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Anyone know of a routine that is faster than memset, or even better for that matter. ZeroMemory, as far as I know uses the same thing. Not that i''d be calling this amillion times, but its always good to know u have the fastest thing out there. Dun mes wit me!

Share this post


Link to post
Share on other sites
Advertisement
Memset is pretty fast. Fast enough to use everywhere, and if you profile and determine that it''s a problem somewhere, you can try to replace it with something else.

Share this post


Link to post
Share on other sites
Neva Mind, I worked it out. Coded a faster one for what I need to do, as long as it can be divided by 64 it''s fine..




mov eax,dword ptr [nSize]; // start position for loop
mov edx, dword ptr [buffer]; //pointer to begining of array
myloop: // start loop
sub eax,64; // decrese counter

mov dword ptr [edx+eax],0;
mov dword ptr [edx+eax+4],0;
mov dword ptr [edx+eax+8],0;
mov dword ptr [edx+eax+12],0;
mov dword ptr [edx+eax+16],0;
mov dword ptr [edx+eax+20],0;
mov dword ptr [edx+eax+24],0;
mov dword ptr [edx+eax+28],0;
mov dword ptr [edx+eax+32],0;
mov dword ptr [edx+eax+36],0;
mov dword ptr [edx+eax+40],0;
mov dword ptr [edx+eax+44],0;
mov dword ptr [edx+eax+48],0;
mov dword ptr [edx+eax+52],0;
mov dword ptr [edx+eax+56],0;
mov dword ptr [edx+eax+60],0;

cmp eax,0; // if counter = 0
jne myloop; // else loop


Is there anyway to optimize this further??

Dun mes wit me!

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yes.. first.. you must decide if you want to load the area you are filling into the memory caches or not - typically when you are filling memory you do NOT want to also fill the caches but what you are doing there is most certainly filling the caches.

For when you desire/need cache pollution and the destination is 32bit aligned and 512 bytes or less..

sub eax, eax
mov edi, StartAddress
mov ecx, BytesToFill/4
rep stosd


In all other cases.. use MMX instructions - the AMD optimisation guide (available online) has a nice memory fill routine for MMX that is also near optimal on intel machines.

- Rockoon

Share this post


Link to post
Share on other sites
sweet, so could u say what each command does in what u said.. I haven''t learnt all ASM yet.
Can''t find the Microsoft MASM documentation.


Dun mes wit me!

Share this post


Link to post
Share on other sites
I tried compiling code which calls memset with g++ on intel.

If you call memset() without optimisation enabled, it calls memset from the C library.

If you enable optimisation, it plants code similar to Anonymous Poster''s - that''s to say, it uses rep stosl.

So just call memset, if you enable optimisation it should be as fast as any. I fail to see how it can be any faster than a rep stosl instruction.

Mark

Share this post


Link to post
Share on other sites
Thanks. as far as cache pollution goes, what is that?
How does it work. Was I filling caches & memory, which part.
This way I can learn and avoid possible desaster.
& also cache pollution a bad thing?

Dun mes wit me!

Share this post


Link to post
Share on other sites
When we first start thinking about it, unrolling loops so we don''t have the extra increment, compare, and jumps every iteration sounds like it''d be faster (less ops have to be done) But in most cases this doesn''t work.

The CPU has two caches, a data cache and a program cache. As our program is ran, the CPU loads chunks of it into the program cache and then runs the instructions from there. We get a speed increase from this because reading from cache is much faster than reading from RAM.

When we unroll a loop we may lower the number of ops the CPU has to perform, but we also greatly increase the size of the program code. With increased program code we can end up with a much higher number of cache misses, which is when the code needed isn''t in cache.

Everytime there is a cache miss the CPU has to flush the cache and refill it from RAM, which we know is quite slow, and while it''s waiting for the cache to be refilled the CPU has no choice but to sit and do nothing. To make matters worse, the increase in program size means there is more code that must be loaded into cache everytime we have a cache miss.

We know that we are going to have cache misses. A couple dozen cache misses per frame isn''t a big deal, but it isn''t hard to imagine how bad it would be if we caused a cache miss for every pixel we plot, every polygon we render, or anything else we do thousands of times per frame. Typically games are made of a couple small pieces of code that is run thousands of times in a row each frame, and with a little care we can get them to fit in the cache, giving us some great performance.

Now back to unrolling loops... The only time we tend to gain anything from unrolling a loop is when we have a small loop (only a few lines of code and few iterations) that gets run a large number of times (such as a loop that does some simple op for each vertex in a tri, and gets run for every tri in the scene). Large loops or loops with a large number of iterations (such as copying memory byte by byte) tend to almost always cause more penalties because of cache misses and code size than they could ever hope to gain by unrolling.


And a bit off topic... Optimizing by unrolling loops is one of the last things you want to do. If you find you are spending a lot of time in a particular loop modifying/changing to a better algorithm, cleaning up code inside loop, and minimizing the number of times the loop will be called will have a much greater effect.



Drakonite

Shoot Pixels Not People

Share this post


Link to post
Share on other sites
Maybe a better optimization would be to not call memset so much. The quickest operation is the one you don''t perform.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!