Archived

This topic is now archived and is closed to further replies.

d000hg

Unrelated Qs: profilers & memcpy()

Recommended Posts

1)Can anyone recommend a free profiler and it''s address? I don''t know exactly what I want- something that tells me how much time I spent in each function and maybe each line. I only have the trial version of MSVC++6 which has no profiler. 2)How does memcpy work? Is it byte by byte or DWORD by DWORD? If the former I suppose with a couple of lines of inline assembly I could make a DWORD copier that would be considerably quicker. Or how about just using MMX to do 64bits at a time? Could I do that with doubles & casting? Or even use the fact that FPU & CPU can act simulaneously - copy from front with DWORDS and the back with doubles?

Share this post


Link to post
Share on other sites
1) AMD CodeAnalyst, from their page (reg required).
2) The memcpy in VC7 is a fixup + rep movsd. byte copy vs. dword actually hardly makes a difference anymore - what counts is making good use of the cache.
if data in L1:
if size > 64
rep movsd
else
mov instructions, maybe in a loop
else
block prefetch, write with movntq

Again, memory is definitely not random access - take advantage of the cache.

Share this post


Link to post
Share on other sites
quote:
Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.
quote:
1) AMD CodeAnalyst, from their page (reg required).
I already have this but it won''t work with my current project (fine with all previos ones!), plus I can''t figure out what to do with it. I just seem to have about 3 different types of break point but I don''t know what each one is or how use them!

Share this post


Link to post
Share on other sites
quote:

How does memcpy work?



I believe it depends upon your OS, but under MS Windows the call is passed off to a kernel level routine (hmemcpy maybe, but don't quote me on that )which copies the maximum no. of DWORDS it can and then finishes off by copying the last few outstanding BYTES if there are any.

It being the most efficient way of doing it I would have a hard time believing other implementations do it any differently.

quote:

byte copy vs. dword actually hardly makes a difference anymore


I would have to disagree on that - unless you are just copying very small arrays, then movsd will be four times faster than movsb (all movsX instructions take the same no. of clock cycles regardless of size)

I agree that for small random copies of memory cache misses will take a large toll when compared to the time the instructions take to complete, but that's something to manage in your code.


[edited by - xf00f on July 27, 2003 11:33:23 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by d000hg
quote:
Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.



nope, works fine for other code too..even says so on their features page:
"profiles managed VB .NET, VC++, C#, JScript 7 and unmanaged VC++"




Share this post


Link to post
Share on other sites
quote:
Original post by Burning_Ice
quote:
Original post by d000hg
quote:
Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.



nope, works fine for other code too..even says so on their features page:
"profiles managed VB .NET, VC++, C#, JScript 7 and unmanaged VC++ "







Yeah, but it doesn''t install unless you have Visual Studio .NET installed.

"Skepticism.... that great rot of the intellect." - V.H.
Bah, what does HE know?


Albekerky Software

Share this post


Link to post
Share on other sites
You can make your own. I have one that renders the profiling data right on the screen, and you can turn it on and off with the keyboard.

There was an article in GameDeveloper magazine a while ago (Feb 2002, I believe) that explains how to make one. You can order a back issue off their site (gdmag.com) or just get the source from their online archive.

Share this post


Link to post
Share on other sites
I''m sure that would be very interesting to do & very useful in the long run but I don''t have the time!! .
I looked again at AMD''s codeanalyst and got it to at least open a project. I tried setting some breakpoints on a function and running but most of the time my app closed immediately (it''s running D3D in a window). The only time I got it to run the rendering window was black for a few seconds then it just told me the majority of the time was spent in the kernel.dll process which isn''t a whole load of help! Why''s it keep throwing me out and how do I actually get some relevant info?

Share this post


Link to post
Share on other sites
quote:
I believe it depends upon your OS, but under MS Windows the call is passed off to a kernel level routine (hmemcpy maybe, but don't quote me on that )which copies the maximum no. of DWORDS it can and then finishes off by copying the last few outstanding BYTES if there are any.
So by doing the same process but with 64bit MMX copies I could get a boost? Or even partial loop unrolling by copying two DWORDS per loop?



[edited by - d000hg on July 27, 2003 7:43:29 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by xf00f
quote:

How does memcpy work?


I believe it depends upon your OS, but under MS Windows the call is passed off to a kernel level routine (hmemcpy maybe, but don''t quote me on that )which copies the maximum no. of DWORDS it can and then finishes off by copying the last few outstanding BYTES if there are any.


Look at memcpy.asm (if you installed the CRT source). Actually, i don''t understand what its doing, but it seems to copy 4 bytes (1 DWORD) at a time. Its certainly not a kernel routine.

Share this post


Link to post
Share on other sites
quote:
Original post by Burning_Ice
nope, works fine for other code too..even says so on their features page:
"profiles managed VB .NET, VC++, C#, JScript 7 and unmanaged


But he is using VC 6

Share this post


Link to post
Share on other sites
P("post" | "don''t know what you are talking about") = 0, right?

> It being the most efficient way of doing it I would have a hard time believing other implementations do it any differently. <
rep movsd is certainly not the most efficient way of copying memory on anything > 486.

quote:

> byte copy vs. dword actually hardly makes a difference anymore
I would have to disagree on that - unless you are just copying very small arrays, then movsd will be four times faster than movsb (all movsX instructions take the same no. of clock cycles regardless of size)

"Disagree"? This isn''t opinion - try it out I''d call the difference 3% on my Athlon XP.
Also, rep movs is not a series of copy instructions - if aligned and size > 64, the CPU does a burst transfer.
The point is moot, since there are faster ways to copy memory: block prefetch for transfers > working set, and an unrolled copy loop for smaller sizes.

> I agree that for small random copies of memory cache misses will take a large toll when compared to the time the instructions take to complete, but that''s something to manage in your code. <
Indeed. The copy code must take this into account.


> Why''s it keep throwing me out and how do I actually get some relevant info?
Check project options - stop on app exit, don''t terminate app.

Share this post


Link to post
Share on other sites