Back to Coding Horrors

Fast Approximation to memcpy()

Coding Horrors Community

Started by L. Spiro October 11, 2016 12:51 PM

29 comments, last by MikeWillHugYou 7 years, 4 months ago

Oxyd

1,162

October 12, 2016 12:41 PM

Did somebody say templates?


#include <cstddef>
#include <iostream>

template <std::size_t I, std::size_t N>
struct memcpy_helper {
    static void
    do_(char* dst, char* src) {
        dst[I] = src[I];
        memcpy_helper<I + 1, N>::do_(dst, src);
    }
};

template <std::size_t N>
struct memcpy_helper<N, N> {
    static void
    do_(char*, char*) { }
};

template <std::size_t N, typename T>
void
memcpy(T* dst, T* src) {
    memcpy_helper<0, N * sizeof(T)>::do_(reinterpret_cast<char*>(dst), reinterpret_cast<char*>(src));
}

int
main() {
    int src[]{1, 2, 3};
    int dst[3];
    
    memcpy<3>(dst, src);
    
    for (int i : dst)
        std::cout << i << '\n';
}

It's fast because there is no run-time loop!

_WeirdCat_

1,512

October 12, 2016 02:57 PM

A coworker provided this extremely fast approximation.


void memcpy(void* dst, void*src, size_t size)
{
                if (size > 1 && dst != src)
                {
                                *dst = *src;
                }

                // The rest can’t be that important
}

L. Spiro

Thats a lol unless operator does:

TMemoryStream::Write(src, byte_amount); and that is:

System.move + c++

wait thats not pure c++ seems like writing to a stream should do,

either fancy way you have to keep it simple. or check all

Master & Mentor
https://sites.google.com/site/customprog/

DekuTree64

1,170

October 13, 2016 10:01 PM

On ARM CPUs, stmia and ldmia are usually better than single load/store instructions. Especially for memset, you can fill 8 registers with the value and write 32 bytes per instruction :) Or unroll it to 32 stmia instructions and get a whole KB per iteration. But dealing with unaligned addresses/non-multiple-of-32 size is a pain.

Mostly I'd be looking at why you're copying so much data around every frame in the first place.

Sik_the_hedgehog

3,003

October 14, 2016 06:16 AM

Everyone knows that null values lead to access violations. So why not just prevent those pesky nulls from being copied in the first place?
void memcpy(void *dest, void *src, size_t size)
{
  strncpy((char*)dest, (const char*)src, size);
}

Reminds me of what Nintendo did with the Wii (for context: they compared keys using strncmp instead of memcmp, so all you needed was to use a key that's all zeroes and the console would take it as a valid disc)

http://thedailywtf.com/articles/Anatomii-of-a-Hack

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

Bacterius

13,181

October 14, 2016 12:24 PM

This one actually works... within a given assumption :wink:
void memcpy(void *dest, void *src, size_t size)
{
  assert(dest == src);
}

Technically the memory pointed to by the src and dest pointers can't overlap, so a compiler could technically optimize "dest == src" to "size == 0" making your function very fast by virtue of accepting only zero-length inputs :lol:

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

_WeirdCat_

1,512

October 22, 2016 04:50 PM

unsigned char * source;

unsigned char * dest;

after copy you just cast

[spoiler]



inline void CopyToPCHAR(std::string str, int * len, char *& p)
{
DeleteIfPersist(len, p);

(*len) = str.Length();

if ((*len) > 0)
{
p = new char[ (*len) + 1 ];
strcpy(p, str.c_str());
p[(*len)] = NULL;
}

}

[/spoiler]


void copy(unsigned char * s, unsigned char * d, int len)
{
//skip len check since we only copy when length > 0
d = new unsigned char[len];
for (int i=0; i < len; i++)
d[i] = s[i];
}

and you use it like that


my_data * tmp = (my_data*)d;

Master & Mentor
https://sites.google.com/site/customprog/

_WeirdCat_

1,512

October 23, 2016 10:59 AM

so how do you transfer it faster?


d = new double[len];
for (int i=0; i < len; i++)
d[i] = s[i];
}

Master & Mentor
https://sites.google.com/site/customprog/

L. Spiro

25,818

Author

October 26, 2016 01:30 PM

Copy 16 bytes at a time via vmovntdqa and vmovntdq (don’t pollute the cache).
Prefetch well via __builtin_prefetch(). Unroll loops to make prefetch less costly and more useful.
Handling trailing bytes with rep movsq and rep movsb.

There are plenty of things you can do.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

toss-dev

118

October 26, 2016 03:44 PM

Hey man, I don't think you can optimize 'memcpy()' or 'memset()' ...

see GNU implementation and actual optimization implemented https://fossies.org/dox/glibc-2.24/string_2memcpy_8c_source.html

You may be able to approximate by only copying 1 bytes of 2, but the unset memory will hold random data... is that really what you want?

However,, if you perfectly know what size is the memory, you can implement something like this

(less comparison are done => faster program)

(copying 8 bytes per 8 bytes => 8 times faster than byte by byte if working on a 64 bit system)


/** if we know src and dst are multiple of 64 bytes (8 uint64_t) */
void * memcpy(void * src, void * dst, size_t n) {
	uint64_t * s = (int *)src;
	size_t len = n / sizeof(uint64_t);
	uint64_t * ptr = (int *)dst;
	uint64_t * end = (int *)(dst + len);
	size_t i = 0;
	while (ptr < end) {
		*ptr++ = *s++; //1
		*ptr++ = *s++; //2
		*ptr++ = *s++; //...
		*ptr++ = *s++;
		*ptr++ = *s++;
		*ptr++ = *s++;
		*ptr++ = *s++;		
		*ptr++ = *s++; //8
	}
}

L. Spiro

25,818

Author

October 27, 2016 09:05 AM

Your link is broken.
Of course, approximating std::memcpy() was a joke, which is why it is in Coding Horrors.

In any case, I optimized it and wrote a faster-in-all-cases version for PlayStation 4.
Already have one 2 or 3 times faster on my Windows® machine, and will make one for Xbox One soon.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Fast Approximation to memcpy()

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Fast Approximation to memcpy()

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines