Jump to content
  • Advertisement
Sign in to follow this  
irreversible

Faster memory copy and allocation

This topic is 2613 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a case where I need to replicate an object many times in memory and currently memory copy is proving to be the biggest bottleneck. The object itself is of geometric nature and can have any size. Its constituents are stored as std::vectors. What I have right now is dynamic allocation per object using per-instance new() with all constituent vectors simply copied using assignment. Calling reserve() on the vectors speeds things up considerably, but I'd like to push it further if possible. Would globally allocating a large chunk of memory and pooling all of the object instances in a pre-allocated space help? The data will be aligned to either 16 or 32 bytes anyway (I haven't decided which yet), though, a better question is whether std::vector is SSE-enabled or would I be better of using memcpy (which I understand is SSE-enabled if the data is aligned and SSE is enabled) or even writing my own copy routine?

Besides these, are there any other tricks I could exploit?

The prevailing theme here is that ultimately I need the copies to be real (not just referential, even though the geometric properties remain the same). While I can spread the copy process itself out with some black magic voodoo adjustments, the ugly truth behind it remains the same: the faster I can create them, the faster I can move forward with using them. Also, at the end of the day I'm talking about thousands to tens of thousands of copy procedures with a target time of less than 1-2 seconds max for real-time applications. Right now I'm looking at roughly 0.5-1.0ms per copy op, which is not too impressive.

Share this post


Link to post
Share on other sites
Advertisement
I am not sure I understand all the details of your situation. Can you post what your class data looks like?

Share this post


Link to post
Share on other sites
What is the total amount of data written?

Divide it by memory bandwidth to determine how far off you are.

Share this post


Link to post
Share on other sites
Also, std::vectors aren't SIMD-friendly without using a custom allocator. See this thread I'd started not too long ago. I decided to use btAlignedObjectArray rather than learn to write a custom allocator, although I intend to go back and learn about [s]alligators[/s] allocators when (if?) I finish my current project.

Edit: Damn iPhone auto-correct.

Share this post


Link to post
Share on other sites

I am not sure I understand all the details of your situation. Can you post what your class data looks like?


As simple as I can make it:

struct Mesh {
std::vector<int>indexes;
};


struct Object {
std::vector<vec3> verts;
std::vector<vec3> normals;
std::vector<vec3> texcoords;
std::vector<Mesh*> meshes;
+ some attribs
};


[color=#1C2837][size=2]What is the total amount of data written?
[/quote]

Comparatively small, actually - eventually it can scale, but probably not to more than 50-100 MB. I'm not working with full datasets yet, so this is all off of estimates.

Share this post


Link to post
Share on other sites
Do those vectors actually require their dynamic resizing abilities, or are they just fixed arrays?

You could go back to basics and make the whole thing POD and contiguously allocated:[source lang=cpp]struct vec3 { float[3] data; };
struct Mesh {
int numIndexes;
int data;
int* GetIndexes() { return &data; }
};
struct Object {
int size;
int someAttribs;
int numVerts;
int numNormals;
int numTexCoords;
int numMeshes;
int data;
vec3* GetVerts() { return (vec3*)&data; }
vec3* GetNormals() { return GetVerts()+numVerts; }
vec3* GetTexCoords() { return GetNormals()+numNormals; }
Mesh* GetMesh(int idx)
{
int* offsetTable = (int*)(GetTexCoords()+numTexCoords);
u8* meshArea = (u8*)(offsetTable + numMeshes);
return (Mesh*)( meshArea + offsetTable[idx] );
}
};

Object* Clone( Object* in )
{
void* out = malloc( in->size );
memcpy( out, in, in->size );//bam! deep copy of the entire object in one go.
return (Object*)out;
}[/source]

Share this post


Link to post
Share on other sites
Why do you need true copies? The fastest copy is one that you never have to perform.


If vec3 is POD and has no non-trivial constructors, the compiler should already be generating what amounts to a memcpy() on the contents of each vector. I suppose you could peek at the generated assembly to confirm that, though.

Share this post


Link to post
Share on other sites
Have a look at yasli_vector from the loki library. It has various optimisations that can make a difference.

Share this post


Link to post
Share on other sites

Why do you need true copies? The fastest copy is one that you never have to perform.


If vec3 is POD and has no non-trivial constructors, the compiler should already be generating what amounts to a memcpy() on the contents of each vector. I suppose you could peek at the generated assembly to confirm that, though.


I need copies, because I need to potentially transform the objects down the line, plus I need geometric data to be available per-instance for some operations that I cannot cheat. This is also where potential deferred copying for some of the objects comes in, though - I can postpone creating actual copies until I have the time to create them or I actually need them. I am, however, working off of the assumption that I will need to change all of the data immediately (which can actually be true in a worst case scenario). I have given referencing serious thought, though and it's a valid optimization in some cases.

[size=2]Do those vectors actually require their dynamic resizing abilities, or are they just fixed arrays?
[color=#1C2837][size=2]

[color=#1C2837][size=2]

[color=#1C2837][size=2]This is the beauty of it - they're static by nature and yes, I've considered switching to POD myself. I suppose one can't get much more basic and ergo faster than that.
[color=#1C2837][size=2]

[color=#1C2837][size=2]However, the change would entail a fair bit of rewriting and I figured it wouldn't hurt to throw the question at GD for ideas first. The problem behind this might indeed be semi-moot, though, since I really haven't benchmarked my code against memory bandwidth like Antheus suggested.
[color=#1C2837][size=2]

[color=#1C2837][size=2]I'll definitely have a look at both [color=#1C2837][size=2]btAlignedObjectArray and [color=#1C2837][size=2]yasli_vector, although right now it seems I'll most likely implement a heuristic pooling scheme and use POD's to assign chunks out of the pool using my own SSE copy scheme.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!