Jump to content
  • Advertisement
mychii

C++ How to provide custom STL allocator through indirect memory access?

Recommended Posts

Posted (edited)

Hi,

I have an object handle to each actual object block in the memory, because the object blocks in the memory moves position in the heap for reasons. This leads to indirect access to the object block through the object handle as it is the only pointer to it. Now this becomes a problem when I want to integrate to STL containers, because creating custom STL allocator for it seem to only require a direct access to the object block, and I'm not sure how to do it using the object handle as the only mean to access the object block reliably. The current best solutions are either modifying the containers, or make my own.

Can anyone help me?

Untitled-1.thumb.jpg.701f9fc7e51fae6dbf978f332c0e0323.jpg

Edited by mychii
Simplifying question... added picture

Share this post


Link to post
Share on other sites
Advertisement
Posted (edited)
2 hours ago, mychii said:

OR, find a way to simply replace the STL containers' default allocator

Allocator allocate a memory for whole buffer, not for each separate item. And sence of rellocation is to get new raw buffer by other size than existing to move existing elements to it new buffer, so addresses of elements anycase will be changed on reallocation. Never store addresses of vector elements in pointers. Use indexes for it. It still correct after reallocations. But indexes not survive a deletions and insertions of elements before it like as pointers.

2 hours ago, mychii said:

If the shrinking/defragmentation happens periodically per game loop and,

Never shrink a long-life buffers without extrimally needs, or ever better reserve required maximal size on start to avoid reallocations of it at all. Also never use a short-life (that live in scope of method/function) dynamic buffers.  It help to minimize or ever complitely avoid memory reallocations during simulation process . Also only way to warranty presence of required memory is to have it allocated on simulation start. Games is kind of realtime software, so have to live by other rules than office software , even in case it is "soft" realtime. "Soft" only mean that it no disaster will happend in case of malfunction, but it don't change realtime requirments to have it working properly and reliably.

2 hours ago, mychii said:

I can further reinvent the wheel by making my own required containers...

At least it wheel may have a chance to be round shaped, unlike square shaped wheels of STL.

2 hours ago, mychii said:

s raw pointer smart enough to change when I move the objects in the memory level?

It just a unsigned __in64 but under other sauce. 

2 hours ago, mychii said:

Any idea on how to make this possible in my case?

Any idea what same you defragmenting and why you need to do it? Also what size of instances you want to proccess? On instances with size over 32 bytes (half of chache line) and especially over 64 bytes data locality don't make any advantages. So in it case just allocate your objects by one (can be done by per-type pools with non-reallocable buffers to speed-up allocation and warranty presence of memory for required number of objects, that adjustable on per-type basis  ) and store pointers to it into vectors (or other containers) of pointers to use as processing lists. It addresses will be unchanged on vector reallocations and also reallocations and insertion/deletions of  processing lists (vectors) will be much faster becouse it require to move a 8-byte pointers, but not a fat instances to new buffer,and no reallocations of instances on realocations of processing lists at all. You can use vectors of STL crazzypointers, raw pointers or any kind of self-made smartpointers for its purposes . Also it no other ways to have a dynamic polimorphysm that often required to have better complexity that much more importent than hardware optimisations. 

Edited by Fulcrum.013

Share this post


Link to post
Share on other sites
Posted (edited)
3 hours ago, Fulcrum.013 said:

Allocator allocate a memory for whole buffer, not for each separate item.

I know that the Allocator allocate a memory for whole buffer. But regardless of how it wants to do with its request, it points to an object in the buffer may change due to defragmentation.

Quote

And sence of rellocation is to get new raw buffer by other size then existing to move existing elements to it new buffer, so addresses of elements anycase will be changed on reallocation. 

I'm not sure I understand what you're trying to say here, but if you're talking about vector growth that it allocates a new whole buffer when it needs to, that isn't the case. It will still have a chance to access an invalid object in the buffer, cause my defragmenter probably moved it already.

Quote

Never store addresses of dynamic arrays elements in pointers. Use indexes for it. It still correct after reallocations. But indexes not survive a deletions and insertions of elements before it like as pointers.

I don't seem to understand what these means.

Quote

Never shrink a long-life buffers without extrimally needs, or ever better reserve required maximal size on start to avoid reallocations of it at all. Also never use a short-life (that live in scope of method/function) dynamic buffers.  It help to minimize or ever complitely avoid memory reallocations.

If you're talking about the vector itself, again, this does not matter. Whether I it will reallocate due to growth or whatever, the objects in the buffer will move, because of my defragmenter.

Quote

Also only way to warranty presence of required memory is have it allocated on system start.

My memory is allocated on system start, and is allocated for each system, including the one for what I asked here. That's why I made my allocators.

Quote

At least it wheel may have a chance to be round shaped, unlike square-shaped wheels of STL.

Might probably end up doing another round for it after all. 😂

 

Quote

Any idea what same you defragmenting and why you need to do it?

I think it's pretty clear; to free space.

Quote

Also what size of instances you want to proccess? On instances over 32 (half of chache line) and especially over 64 bytes data locality don't make any advantages. So in it case just allocate your objects by one (can be done by per-type pools to speed-up allocation and warranty presence of memory for required number of objects adjustable on per-type basis  ) and store pointers to it into vectors of pointers to use as processing lists. It addresses will unchanged on vector reallocations and reallocations works much faster becouse moving a 8-byte pointers, but not a fat instances to new buffer. You can use vectors of smartpointers for its purposes. 

These are for game objects and its related data, such as components. High level codes. So size of instances may vary of course. Can be as low as it can be, can be over 9000 (hopefully not!), depends on what is attached to it. Specifically the components, they are attached to game object by reference because they are internally (planned to be) allocated in a contiguous array of the same type, and so are the game objects themselves. So I already have those happy pools in the end, in theory. But I think such discussion is way off topic.

Response to Edit:

Quote

You can use vectors of STL crazzypointers, raw pointers or any kind of self-made smartpointers for its purposes . Also it no other ways to have a dynamic polimorphysm that often required to have better complexity that much more importent than hardware optimisations. 

I'd love that, but game objects and its friends come and go pretty fast, leaving scars in the memory quite quickly. I need that memory, whether it's because I'm out of it, or I want to make it as small as I can. But again, It's kinda off topic, but I appreciate for possible alternative solution of what I'm doing, disregarding the actual question.

Edited by mychii
Adding response to edited text

Share this post


Link to post
Share on other sites
7 minutes ago, mychii said:

Specifically the components, they are attached to game object by reference because they are internally (planned to be) allocated in a contiguous array of the same type, and so are the game objects themselves.

Ok. Looks like components will be allocated from pool by one at time, becouse binded to objects by ref. So why not to make a pools by segmented arrays that not need to be reallocated on enlarging and also can drop segments without move a content in case its segment complitelly free? Also it can have a queque of deleted slots to O(1) search a slot for reuse in fragmentated pool.

13 minutes ago, mychii said:

I think it's pretty clear; to free space.

To reuse a slots from same pool or to srink a pool? First can be done without defragmentation, second have no any sence on realtime systems.

16 minutes ago, mychii said:

If you're talking about the vector itself, again, this does not matter. Whether I it will reallocate due to growth or whatever, the objects in the buffer will move, because of my defragmenter.

By best practicles of realtime software it have to avoid dynamic memory reallocations and have all required memory on hands anytime. If you have enought buffers/pools size allocated on start it no any needs to shrink it. Also in case it not enought space in any buffer system is already broken , becouse it mean that system can not process data at time and is not a realtime system anymore, regardless is it able or unable to enlarge buffers (for "soft" realtime it applicable only partially, so enlarging buffers by adding new segments to pool may help reanimate system on some cases). Also realtime systems have to avoid operations with unpredicable time like defragmentations, GC and so on to be a realtime.

28 minutes ago, mychii said:

It will still have a chance to access an invalid object in the buffer, cause my defragmenter probably moved it already.

It is same problem - object moved while have a refs/indexes that point to it old location that became invalid. It is only two ways to renew a pointers. First way is to build pointers map walking from root using a full RTTI data before defrgmentation just like Java or C# "stop-the -world" GC defragmentator do. It have no any sence especially for realtime system, becouse after it you may start to defrag again heap that has been used to build map.. Second way is to keep in objects addresses of all refs to it anytime (2-way pointers), and update it on reallocations of object. It very powerfull method that also required for many other purposes, like automatic nulling/removing from lists refs to object that destroyed/selfdestroyed while it have active refs, O(1) check is object allready exists in specific predefined role list and so on. But usually full implementation of 2-way is complete overkill, becouse for most classes list of refs and roles of its known and also number of refs can be limited by its known role refs.

42 minutes ago, mychii said:

Might probably end up doing another round for it after all. 

Really it thing that usually better to do on start and do it universally, becouse it save many many screens of code, that use containers and especially 2-way ptrs into it.

Share this post


Link to post
Share on other sites
Posted (edited)
2 hours ago, Fulcrum.013 said:

Ok. Looks like components will be allocated from pool by one at time, becouse binded to objects by ref. So why not to make a pools by segmented arrays that not need to be reallocated on enlarging and also can drop segments without move a content in case its segment complitelly free? Also it can have a queque of deleted slots to O(1) search a slot for reuse in fragmentated pool.

I think we're talking a different layer. This is like talking about std::vector that I hope can work on top of my memory allocation right now, asking how to use custom STL allocator using indirect memory access. Don't get me wrong though I like your idea, I am probably even just gonna use stack-based data structure for it, but that component thing I explained is just an example of how I will utilize the memory allocation under-the-hood for it, but not how these components will be handled/structured like this.

Quote

To reuse a slots from same pool or to srink a pool? First can be done without defragmentation, second have no any sence on realtime systems.

Shrink the pool. It does from what I tested. It's done gradually by a separate worker, defragmenting parts of the memory per loop within allocated budget that is acceptable by me so far. Unless your data is too large and your pool is also too large, which is not my case. They're just managing game object information so I don't think the memory usage (so far!) isn't really that big. Not more than 50 mb pool in total I think, could be way less depending on the environment (or the designer goes crazy). The pool can be even smaller if they're segmented per thread worker, still in theory though, cause I don't need it yet, but possible to make.

Quote

By best practicles of realtime software it have to avoid dynamic memory reallocations and have all required memory on hands anytime. If you have enought buffers/pools size allocated on start it no any needs to shrink it.

The whole memory for the game has been reserved first hand. But that also means I got to manually manage them virtually, and that involves all techniques including shrink, grow, best fit, close to fit, reallocating, list, whatever works to make sure I have enough space on these game objects that come (even worst, burst) and go pretty fast, CPU cache friendly, and better locality of reference. Please don't assume I'm using default malloc/free for every single small allocation for this.

Quote

Also in case it not enought space in any buffer system is already broken , becouse it mean that system can not process data at time and is not a realtime system anymore, regardless is it able or unable to enlarge buffers (for "soft" realtime it applicable only partially, so enlarging buffers by adding new segments to pool may help reanimate system on some cases). Also realtime systems have to avoid operations with unpredicable time like defragmentations, GC and so on to be a realtime.

Not in this case. Gameplay programmer needs rapid iterations of features over memory management. I give freedom of that but also flexibility if they wanna go deeper, whether if it's still inside the garbage-collected pool or outside. If you're talking about low level codes like rendering, that's a different stuff, and I'm not even talking about that, as if I'm using this GC for everything. I have mentioned it very clearly in my very first sentence, this is ONE of my memory allocator of many. It is the only one that is managed by the defragmenter. The rest returns normal pointers to objects in buffer.

Quote

It is same problem - object moved while have a refs/indexes that point to it old location that became invalid. It is only two ways to renew a pointers. First way is to build pointers map walking from root using a full RTTI data before defrgmentation just like Java or C# "stop-the -world" GC defragmentator do. It have no any sence especially for realtime system, becouse after it you may start to defrag again heap that has been used to build map.. Second way is to keep in objects addresses of all refs to it anytime (2-way pointers), and update it on reallocations of object. It very powerfull method that also required for many other purposes, like automatic nulling/removing from lists refs to object that destroyed/selfdestroyed while it have active refs, O(1) check is object allready exists in specific predefined role list and so on. But usually full implementation of 2-way is complete overkill, becouse for most classes list of refs and roles of its known and also number of refs can be limited by its known role refs.

Really it thing that usually better to do on start and do it universally, becouse it save many many screens of code, that use containers and especially 2-way ptrs into it.

I thought about that too already but I don't think renewing the pointers is a good idea. I'd rather keep track a good chunk of data that has a chance that they're deallocated in batch than updating the pointer table that may not be there anymore, for each pointer, which means allocation and deallocation gets more overhead to take care of the pointer table.

As for number two, yes exactly. As promising as it could be of two way pointers, it's basically the same as number one to me, but worse cause of lack of locality of reference I think (accessing random addresses to update addresses that are also somewhere random in the heap). And if there are more than one pointers to update, then it's more problem. It's not a option I guess.

 

Anyway, I assume you understand better by now what I'm trying to do so.... to the actual question... I assume I can't provide custom STL allocator through indirect memory access, such as that pointer to the pointer of object in the buffer and is expected to reinvent the wheel in this one?

Edited by mychii

Share this post


Link to post
Share on other sites
4 hours ago, mychii said:

Gameplay programmer needs rapid iterations of features over memory management

So make a lists of pointers for alive objects and iterate over it. On instances that size exids a line of chahce size data locality dont make any advantages. In it  case even accesing data of same instance is a random access, not a continuous.

4 hours ago, mychii said:

CPU cache friendly, and better locality of referenc

Instances over 64 bytes size is not a chache friendly at all.  Looks like you just try to invent another Goldberg machine.

Share this post


Link to post
Share on other sites
Posted (edited)
5 hours ago, mychii said:

I assume I can't provide custom STL allocator through indirect memory access, such as that pointer to the pointer of object in the buffer and is expected to reinvent the wheel in this one?

You just can make a kind of handles with fixet addr that point to actual objects in pool, and for any other objects store a pointers to it handless. as result on defragmentation stage you have to update only pointer per instance. Backface is one more pointer dereferencing in accesing component tru the owner object, but continuos access when you iterating a pool. Also you can perform update a handle on move constructor of instance. But again it have sence to perfom a defragmentation on tiny instances that also used by create-and forgot scheme and processing by pool without involwing other components, like as bullets, particles and so on where order of items not importent so defragmentation can be done just by swap with latest deletion technique witout needs to update external refs. 

 

5 hours ago, mychii said:

Not more than 50 mb pool in total I think, could be way less depending on the environment (or the designer goes crazy).

5 hours ago, mychii said:

As promising as it could be of two way pointers, it's basically the same as number one to me, but worse cause of lack of locality of reference I think (accessing random addresses to update addresses that are also somewhere random in the heap).

You moving to new location up to 50 mb on reallocation that accessed randomly to perform a shrink/enlarge together with defragmentation, that have no any difference is buffer complitely new or you moving data into existing buffer. Also it require to perform per-field copy  in case elements is not a pod-typed. Accesing of external refs usualy have to update much less memory size, so it usualy a acceptable overhead especially on such heavy and chache foe operation like defragmentation.

5 hours ago, mychii said:

The pool can be even smaller if they're segmented per thread worker, still in theory though, cause I don't need it yet, but possible to make.

Did you take in accounts a 10k CPU cycles of direct cost  and up to 10M+ CPU cycles of indirect costs caused by chache invalidation, that required per EACH thread switch? Also what same you developt, in case you cores have no other bussineses that stay in queque to  memory controller on operations that is kind of array sorting?Bottleneck of it operation is access to memory and you can not speed up it just by dropping more water to bottleneck (involve more cores to process).T

Edited by Fulcrum.013

Share this post


Link to post
Share on other sites
1 minute ago, Fulcrum.013 said:

You just can make a kind of handles with fixet addr that point to actual objects in pool, and for any other objects store a pointers to it handless. as result on defragmentation stage you have to update only pointer per instance. Backface is one more pointer dereferencing in accesing component tru the owner object, but continuos access when you iterating a pool. Also you can perform update a handle on move constructor of instance. But again it have sence to perfom a defragmentation on tiny instances that also used by create-and forgot scheme and processing by pool without involwing other components, like as bullets, particles and so on where order of items not importent so defragmentation can be done just by swap with latest deletion technique witout needs to update external refs. 

Thank you for answering.

Share this post


Link to post
Share on other sites
21 minutes ago, Fulcrum.013 said:

You moving to new location up to 50 mb on reallocation that accessed randomly to perform a shrink/enlarge together with defragmentation, that have no any difference is buffer complitely new or you moving data into existing buffer. Also it require to perform per-field copy  in case elements is not a pod-typed. Accesing of external refs usualy have to update much less memory size, so it usualy a acceptable overhead especially on such heavy and chache foe operation like defragmentation.

You're assuming I'm doing a full defrag on allocation when I didn't say so, wide and clear.

1 hour ago, Fulcrum.013 said:

So make a lists of pointers for alive objects and iterate over it. On instances that size exids a line of chahce size data locality dont make any advantages. In it  case even accesing data of same instance is a random access, not a continuous.

Instances over 64 bytes size is not a chache friendly at all.  Looks like you just try to invent another Goldberg machine.

21 minutes ago, Fulcrum.013 said:

Did you take in accounts a 10k CPU cycles of direct cost  and up to 10M+ CPU cycles of indirect costs caused by chache invalidation, that required per EACH thread switch? Also what same you developt, in case you cores have no other bussineses that stay in queque to  memory controller on operations that is kind of array sorting?Bottleneck of it operation is access to memory and you can not speed up it just by dropping more water to bottleneck (involve more cores to process).T

You are still assuming things. I said "can be as low as it can be", then you choose you assume my instances will be all over 64 bytes. By your own data conclusion, you believe I shouldn't make them contiguous as it invalidates the cache anyway, because, again, you assume my instances will be all over 64 bytes, when I clearly didn't say so. I have no further comment over this, but I take it as an advice.

I also can't comment anything regarding my plan over the thread worker as I haven't tested it myself (cause... it's a plan?), so again I take it as an early advice when I test it. I will only stay true based on the actual result of my test, which means I will attempt it with other people's advice around me, with also your advice in mind.

 

I create this as different reply because I'm replying only the off topic part you like to discuss. Since it's clearly off topic, I also want to say I won't reply these further here. You can wait until I make a new thread related to this. If you have any further reply regarding to the actual question of this thread, I'd love to hear about it. Thanks again. ;)

Share this post


Link to post
Share on other sites
Posted (edited)
33 minutes ago, mychii said:

You're assuming I'm doing a full defrag on allocation when I didn't say so, wide and clear.

Even in case you performing a partial defrag, on fat instances (especially non pods) updating of external refs much less complexive task than copying of instance to new location. So on per-istance basis it usually acceptable overhead too, especyally in case with habdles where you need to update only ref per instance. Also to keep external refs updated instances have anycase be non-pod so flatcopy accelerations is not applicable. And looks like it no ways to have emulated pod with STL allocators for classes for wich only non-pod part is a refs updating mechanism. Looks like it require other architrcture of allocators (or better call it stores) where flatcopy accelerated routines (memmove/memcpy for heap buffers) provided by store. By other ways allocator just not take a control on accelerated copying so can not track a refs itself to emulate pods.

Edited by Fulcrum.013

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!