Sign in to follow this  

memcpy

This topic is 3858 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I am interested in the virtual texture algorithm for rendering of a large terrain. I am trying some tests to update Direct3D textures at each frames. But there is something strange: On my PC, it is faster to read texture's data directely from Hard Drive than from memory.
D3DLOCKED_RECT lockedRect;
m_TextureId->LockRect(0, &lockedRect, 0, D3DLOCK_DISCARD);
m_HDTextures[0][0].read((char*) lockedRect.pBits  , imageSize*imageSize*4);
m_TextureId->UnlockRect(0);
// Where m_HDTextures[0][0] is std::fstream
Is faster than:
D3DLOCKED_RECT lockedRect;
m_TextureId->LockRect(0, &lockedRect, 0, D3DLOCK_DISCARD);
memcpy(lockedRect.pBits, m_FinalImage, imageSize*imageSize*4); 		
m_TextureId->UnlockRect(0);
// Where m_FinalImage has been read in the beginning of the application
Why ? A copy from memory should be faster than a read from Hard Drive ? Is it something faster than memcpy ?

Share this post


Link to post
Share on other sites
1. Is this release or debug mode?
2. Is m_FinalImage aligned properly? (Is the pointer address a multiple of 4?)

Share this post


Link to post
Share on other sites
1. It was in debug mode. I tried in release mode and I had the same FPS.

2. I don't understand exactly what you mean. I have used this for allocation:

m_FinalImage = (char*)malloc(imageBigSize*imageBigSize*4*sizeof(char));

I use X4 because of R,G,B,A


Share this post


Link to post
Share on other sites
looking at VS CRT source for memcpy, it is a byte-by-byte copy with a while loop counter, which is why i believe it is slow.


void * __cdecl memcpy (
void * dst,
const void * src,
size_t count
)
{
void * ret = dst;

#if defined (_M_IA64)

{
// ...
}

#else /* defined (_M_IA64) */
/*
* copy from lower addresses to higher addresses
*/

while (count--) {
*(char *)dst = *(char *)src;
dst = (char *)dst + 1;
src = (char *)src + 1;
}
#endif /* defined (_M_IA64) */

return(ret);
}

Share this post


Link to post
Share on other sites
Quote:
Original post by texel3d
1. It was in debug mode. I tried in release mode and I had the same FPS.

2. I don't understand exactly what you mean. I have used this for allocation:

m_FinalImage = (char*)malloc(imageBigSize*imageBigSize*4*sizeof(char));

I use X4 because of R,G,B,A
How are you measuring performance? FPS is next to useless for this. Are you creating the texture every frame or something? Because you really shouldn't be doing that at all anyway.

EDIT: Off topic, but you really shouldn't copy data into the texture like that. You need to pay attention to the pitch, and copy a scanline at a time. You can put in some code to copy the texture data in one chunk if (and only if) you detect that the pitch is exactly equal to the texture width * bytes per pixel. Remember that it might be different for different cards too.


Quote:
Original post by yadango
looking at VS CRT source for memcpy, it is a byte-by-byte copy with a while loop counter, which is why i believe it is slow.

*** Source Snippet Removed ***
Only in debug builds. There's a file called memcpy.asm which copies aligned blocks of memory which is used in release builds.

Share this post


Link to post
Share on other sites
ah cool. i never noticed that till now. sse2 optimization... yeah, should be fast :-).

Share this post


Link to post
Share on other sites
I don't care *how* slow his compilers implementation of memcpy is, it should still beat reading from the harddrive. In the absolute best case scenario, reading from the file will be slightly (perhaps unnoticably) slower once the file has been cached by the OS.

Odds are, your method of measuring performance is horribly broken.

Share this post


Link to post
Share on other sites
I use fraps to display fps.

I create the texture just one time (in the beginning of the application). After, i just use lock and unlock.

The file with texture's data is also open in the beginning of the application, and is closed at the end.

I know it's not a good idea to update the texture from hard drive at every frame. I do it in order to view the performance and to see how the FPS can decrease at it maximum.

Is there any tutorials which explain how to load very big textures from Hard Drive in Direct3D to render a large terrain in real time with a good FPS. (And not just the theory).

Share this post


Link to post
Share on other sites
Quote:
I use fraps to display fps.


FPS can't be used to measure efficiency of memcpy. Or anything else for that matter. FRAPS even less so.

You'll need a high-resolution timer at very least. And in order to measure memcpy, you'll need to run it in a loop and then average the results.

If you were getting 500fps that would mean your timer has 2 ms resolution. If you're getting 60 fps, that means 16ms resolution.

memcpy, even when doing unoptimized copy, will likely take a few microseconds to complete.

So you might as well stop right now, or change the timing methods (see QueryPerformanceCounter on Windows).

Share this post


Link to post
Share on other sites
Sign in to follow this