Unrelated Qs: profilers & memcpy()

Started by
12 comments, last by d000hg 20 years, 9 months ago
1)Can anyone recommend a free profiler and it''s address? I don''t know exactly what I want- something that tells me how much time I spent in each function and maybe each line. I only have the trial version of MSVC++6 which has no profiler. 2)How does memcpy work? Is it byte by byte or DWORD by DWORD? If the former I suppose with a couple of lines of inline assembly I could make a DWORD copier that would be considerably quicker. Or how about just using MMX to do 64bits at a time? Could I do that with doubles & casting? Or even use the fact that FPU & CPU can act simulaneously - copy from front with DWORDS and the back with doubles?
Advertisement
1) AMD CodeAnalyst, from their page (reg required).
2) The memcpy in VC7 is a fixup + rep movsd. byte copy vs. dword actually hardly makes a difference anymore - what counts is making good use of the cache.
if data in L1:
if size > 64
rep movsd
else
mov instructions, maybe in a loop
else
block prefetch, write with movntq

Again, memory is definitely not random access - take advantage of the cache.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
quote:Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.
quote:1) AMD CodeAnalyst, from their page (reg required).
I already have this but it won''t work with my current project (fine with all previos ones!), plus I can''t figure out what to do with it. I just seem to have about 3 different types of break point but I don''t know what each one is or how use them!
quote:
How does memcpy work?


I believe it depends upon your OS, but under MS Windows the call is passed off to a kernel level routine (hmemcpy maybe, but don't quote me on that )which copies the maximum no. of DWORDS it can and then finishes off by copying the last few outstanding BYTES if there are any.

It being the most efficient way of doing it I would have a hard time believing other implementations do it any differently.

quote:
byte copy vs. dword actually hardly makes a difference anymore

I would have to disagree on that - unless you are just copying very small arrays, then movsd will be four times faster than movsb (all movsX instructions take the same no. of clock cycles regardless of size)

I agree that for small random copies of memory cache misses will take a large toll when compared to the time the instructions take to complete, but that's something to manage in your code.


[edited by - xf00f on July 27, 2003 11:33:23 AM]
quote:Original post by d000hg
quote:Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.


nope, works fine for other code too..even says so on their features page:
"profiles managed VB .NET, VC++, C#, JScript 7 and unmanaged VC++"




quote:Original post by Burning_Ice
quote:Original post by d000hg
quote:Here is a free profiler for Visual Studio: Compuware DevPartner Profiler Community Edition
It looks nice but is only for .NET managed code unfortunately.


nope, works fine for other code too..even says so on their features page:
"profiles managed VB .NET, VC++, C#, JScript 7 and unmanaged VC++ "






Yeah, but it doesn''t install unless you have Visual Studio .NET installed.

"Skepticism.... that great rot of the intellect." - V.H.
Bah, what does HE know?


Albekerky Software
"Skepticism.... that great rot of the intellect." - V.H.Bah, what does HE know?Albekerky Software
You can make your own. I have one that renders the profiling data right on the screen, and you can turn it on and off with the keyboard.

There was an article in GameDeveloper magazine a while ago (Feb 2002, I believe) that explains how to make one. You can order a back issue off their site (gdmag.com) or just get the source from their online archive.
I''m sure that would be very interesting to do & very useful in the long run but I don''t have the time!! .
I looked again at AMD''s codeanalyst and got it to at least open a project. I tried setting some breakpoints on a function and running but most of the time my app closed immediately (it''s running D3D in a window). The only time I got it to run the rendering window was black for a few seconds then it just told me the majority of the time was spent in the kernel.dll process which isn''t a whole load of help! Why''s it keep throwing me out and how do I actually get some relevant info?
quote:I believe it depends upon your OS, but under MS Windows the call is passed off to a kernel level routine (hmemcpy maybe, but don't quote me on that )which copies the maximum no. of DWORDS it can and then finishes off by copying the last few outstanding BYTES if there are any.
So by doing the same process but with 64bit MMX copies I could get a boost? Or even partial loop unrolling by copying two DWORDS per loop?



[edited by - d000hg on July 27, 2003 7:43:29 PM]

This topic is closed to new replies.

Advertisement