• Advertisement
Sign in to follow this  

Memory pools, size and inheritence. How small is small?

This topic is 4617 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there. I'm thinking about implementing a memory management facitilty within my game engine. Anyway, the literature I've read, both boost documentation and the book C++ for Game Programmers, say that using memory pools should only be done on "small" objects. But just how small is small? for example I have a base sprite class and some inhertited classes below that: base sprite:
class SiSEBaseSprite
{
    public:
        SiSEBaseSprite(SDL_Surface *aSurface, float x = 0.0f, float y = 0.0f, float speed = 0.0f);  // Constructor
        virtual ~SiSEBaseSprite();         // Destructor
        
        void mAddAnimation(SiSEAnimation* SAnim);
        
        // Drawing functions
        void mDraw();
        void mClearBG();
        void mUpdateBG();
        void mDrawSpriteIMG(SDL_Surface *img, float x, float y);
        void mDrawSprite(void);
        
        // Accessor functions
        
        SCALAR mGetCurrX(void){return mCurrPos->mGetX();}
        SCALAR mGetCurrY(void){return mCurrPos->mGetY();}
        SCALAR mGetLastX(void){return mLastPos->mGetX();}
        SCALAR mGetLasyY(void){return mLastPos->mGetY();}
        int mGetHeight(void){return mGetAnimation()->mGetHeight();}
        int mGetWidth(void){return mGetAnimation()->mGetWidth();}
        SDL_Surface* mGetFrame(int frameNum){return mGetAnimation()->mAnim[frameNum]->image;}
        int mGetNumFrames(void){return mGetAnimation()->mGetFramesNum();}
        
        // inline
        inline Vector2D* mGetCurrPos(void){return mCurrPos;}
        inline Vector2D* mGetLastPos(void){return mLastPos;}
        inline void mSetFrame(int nr){mFrame = nr;}
        inline int mGetFrame(void){return mFrame;}
        inline void mSetSpeed(float nr){mSpeed = nr;}
        inline float mGetSpeed(void){return mSpeed;}
        inline void mSetScreen(SDL_Surface *aScreen){mScreen = aScreen;}
        inline SDL_Surface* mGetScreen(void){return mScreen;}
        
        // Animations control functions
        inline void mToggleAnim(void){mAnimating = !mAnimating;}
        inline void mStartAnim(void){mAnimating = true;}
        inline void mStopAnim(void){mAnimating = false;}
        inline void mRewind(void){mFrame = 0;}
        bool mIsAnimating(void);
        void mSetAnimKey(const std::string& animKey);
        
        // COORD control functions
        void mXAdd(float nr);
        void mYAdd(float nr);
        void mXYAdd(float x, float y);
        void mXSet(float nr);
        void mYSet(float nr);
        void mXYSet(float x, float y);
        
    protected:
        int mFrame;                     // Animation frame tracker
        
        // Coordinate vectors
        Vector2D* mCurrPos;
        Vector2D* mLastPos;
        std::string mAnimKey;           // Animation key
        bool mDrawn;                    // Drawn flag
        bool mAnimating;
        float mSpeed;                   // Speed indicator (pause multiplier)
        long mLastUpdate;               // Time sprite was last animated
        SiSEAnimation* mSpriteAnim;     // Pointer to animations (sprite frames)
        typedef std::map<std::string, SiSEAnimation*> mAnimMap;
        typedef std::pair<std::string, SiSEAnimation*> mAnimPair;
        mAnimMap mAnimations;
        SDL_Surface* mScreen;           // Pointer to the surface to animate on
        SiSEAnimation* mGetAnimation();
        bool mAnimOnScreen(int offSet);
};



Is that "small" enough? How does inheritence effect the size of derived classes? Can anyone shed some light on this? And a little "extra" question. I have a function that returns reference to a string. In some circumstances I want to return a null value in the string. Is the statement return static_cast<std::string>(NULL); Okay, or is there a better way becuase I get the following compiler warning: warning: returning reference to temporary [Edited by - garyfletcher on June 1, 2005 3:27:53 AM]

Share this post


Link to post
Share on other sites
Advertisement
Regarding the references. You can NEVER return a NULL reference, that undermines the whole purpose of references, that is, that they ALWAYS are valid. If you would at any time need to return a NULL, you'll need to return a pointer, not a reference. And regarding the error you're getting, it's basically telling you that you can't return a reference to a local variable, because it gets destroyed when you leave the scope, and thus, you have a reference to an invalid object.

As for the memory manager, well, in my engine, I've got any allocation of less then 1024 bytes considered small and allocated from a pool. As far as I know, with inheritance, the onlny size overhead is for virtual functions, and any data members you add in yourself in the derived classes.

Share this post


Link to post
Share on other sites
You can basically consider that the size of a class is the size of its fields. The number of methods is irrelevant, they are not saved in the instance.

Some other factors influence the "real" size: most implementations use a v_table pointer for virtual calls (4 additional bytes), and may store additionnal meta-informations. Padding can make your class a little bigger, too.

Inheritance makes your class bigger, because it adds all the fields of the parent class to yours. It doesn't have any other extra cost.

Remember that if you want to know the exact size of a class, you can use the sizeof operator.

Share this post


Link to post
Share on other sites
Jods, just out of curiousity, does the sizeof operator take the v-table into account?

Share this post


Link to post
Share on other sites
Quote:
Original post by SirLuthor
Jods, just out of curiousity, does the sizeof operator take the v-table into account?


No it does not.

EDIT: Yes it does :) Each C++ object has a pointer to it's vtable as the first data item in the object. Thanks DigitalDelusion and CoffeeMug for reminding me.



[Edited by - pragma Fury on June 1, 2005 12:06:50 PM]

Share this post


Link to post
Share on other sites
Further, you can reduce the size of your class by ordering your member variables largest -> smallest. This all depends on what size byte-allignment you use, but by default I think it's 4 Bytes on most compilers (someone correct me if I'm wrong.)

Anyway, let's take a look a your class's members, shall we? The letters are what I'll use to refer to them later. Numbers in brackets are the size in bytes.
int mFrame;            // A   (4)
Vector2D* mCurrPos; // B (4)
Vector2D* mLastPos; // C (4)
std::string mAnimKey; // D (16)
bool mDrawn; // E (1)
bool mAnimating; // F (1)
float mSpeed; // G (4)
long mLastUpdate; // H (4)
SiSEAnimation* mSpriteAnim; // I (4)
mAnimMap mAnimations; // J (16)
SDL_Surface* mScreen; // K (4)


Using 4-byte allignment, this is how the compiler will create your class:
                     (byte)
0 1 2 3
(addy) +--------------------------------+
0x0000 | A |
0x0004 | B |
0x0008 | C |
0x000C | D |
0x0010 | D |
0x0014 | D |
0x0018 | D |
0x001C | E | F | EMPTY |
0x0020 | G |
0x0024 | H |
0x0028 | I |
0x002C | J |
0x0030 | J |
0x0034 | J |
0x0038 | J |
0x0040 | K |
+--------------------------------+


Now, most of the members in your class are exactly 4 bytes, or a multiple of 4 bytes, so it's not bad. It'll be 64 bytes in size. Look what happens at 0x001C though.. because you've got two byte-sized (pun definitely intended) variables there, all you have left at that memory location is room for 2 more bytes. Naturally, "G" will not fit in there, or it would have to cross the bounary between 0x001C and 0x0020, so the compiler sticks it down in the next slot it will fit in. By arranging members largest-smallest you can save 2 bytes:
                     (byte)
0 1 2 3
(addy) +--------------------------------+
0x0000 | D |
0x0004 | D |
0x0008 | D |
0x000C | D |
0x0010 | J |
0x0014 | J |
0x0018 | J |
0x001C | J |
0x0020 | A |
0x0024 | B |
0x0028 | C |
0x002C | G |
0x0030 | H |
0x0034 | I |
0x0038 | K |
0x0040 | E | F | EMPTY |
+--------------------------------+


Ta-da! The size of your class is now only 62 bytes, instead of 64. Granted, in your case it's not a huge savings, and if you were to create an array of these classes, it would still occupy 64 bytes. But consider the following class:
class Foo {
public:
bool A;
int B;
short C;
int D;
short E;
double F;
};


The size of this class would be 28 bytes, and have the following memory structure:
                     (byte)
0 1 2 3
(addy) +--------------------------------+
0x0000 | A | |
0x0004 | B |
0x0008 | C | |
0x000C | D |
0x0010 | E | |
0x0014 | F |
0x0018 | F |
+--------------------------------+


Rearranging the class thusly:
class Foo {
public:
double F;
int B;
int D;
short C;
short E;
bool A;
};


We see our in-memory structure now looks like this:
                     (byte)
0 1 2 3
(addy) +--------------------------------+
0x0000 | F |
0x0004 | F |
0x0008 | B |
0x000C | D |
0x0010 | C | E |
0x0014 | A | |
+--------------------------------+


Hoorah! We've reduced our class down to a mere 21 bytes! 7 bytes smaller!

Generally, however, it's recommended that your order your members in such a way that makes readability better, unless memory usage is absolutely crucial.

You can change byte allignment by using the #pragma pack directive, but keep in mind that if your variable has to cross memory boundaries you will have a significant performance impact, as the CPU then has to fetch two address locations and do some shifting & OR-ing to get the actual value of that variable, instead of a single address retrieval.

[Edited by - pragma Fury on June 1, 2005 12:50:50 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by pragma Fury
Quote:
Original post by SirLuthor
Jods, just out of curiousity, does the sizeof operator take the v-table into account?


No it does not.


Yes it does, and if you don't belive me just try this:


#include <iostream>

class NonVirtual {};
class Virtual{ virtual ~Virtual(){}};

int main(void)
{
endl(std::cout << "sizeof:\nNonVirtual = " << sizeof( NonVirtual) << "\nVirtual = " << sizeof( Virtual));
}

Share this post


Link to post
Share on other sites
Quote:
Original post by pragma Fury
No it does not.

Yes it does.
Edit: ahh, beaten to it by digital.

Share this post


Link to post
Share on other sites
Ack.. I stand corrected.

Yeah.. I uh.. I was thinking of something totally different. Or I wasn't thinking :P

Share this post


Link to post
Share on other sites
Quote:
Original post by garyfletcher
But just how small is small?

What is your purpose in creating a pool? Unless you answer this question for yourself, there is no need to create one (note, educational value is a perfectly valid answer). Generally, you should create a pool for small objects that are allocated and deallocated very often (often enough to potentially hurt performance). If you're dealing with objects that you allocate once in a blue moon, don't worry about pools. For objects that do come and go very often, anything between 1-300 bytes is generally a small object. These values will change, of course, depending on your needs but it's a good rule of thumb.
Quote:
Original post by garyfletcher
And a little "extra" question. I have a function that returns reference to a string. In some circumstances I want to return a null value in the string.

You can't do that. Smart pointers are generally a good solution to your problem (boost::shared_ptr).
Quote:
Original post by garyfletcher
Okay, or is there a better way becuase I get the following compiler warning:

warning: returning reference to temporary

You get a warning for a different reason. Your code won't work. If you have something like this:

int& f()
{
int n;
return n;
}

it will fail because when the function returns, n is automatically deallocated since it's created on the stack. Once again, a smart pointer is probably a good solution here.

Share this post


Link to post
Share on other sites
Thanks for all the help and advice guys (and girls).

The reasons for using a custom memory management module are 2 fold really.

Educational & Performance. I want to get a good grounding in developing an engine which provides a small sdk for SDL...adding sprite classes, scrolling classes etc. I want to do things right from the start so when I eventually graduate to 3-D I have a good grounding in the best techniques.

Speaking about performance. I know that "lack-lustre" class design can lead to cache misses at runtime. Does the #pragma pack directive automatically pack the class into enough bytes to minimise cache misses, ie. does it auto pad if required. Also is it portable?

Share this post


Link to post
Share on other sites
Quote:
Original post by garyfletcher
Thanks for all the help and advice guys (and girls).

The reasons for using a custom memory management module are 2 fold really.

Educational & Performance. I want to get a good grounding in developing an engine which provides a small sdk for SDL...adding sprite classes, scrolling classes etc. I want to do things right from the start so when I eventually graduate to 3-D I have a good grounding in the best techniques.

Speaking about performance. I know that "lack-lustre" class design can lead to cache misses at runtime. Does the #pragma pack directive automatically pack the class into enough bytes to minimise cache misses, ie. does it auto pad if required. Also is it portable?


Nothing wrong with doing stuff for education, or just for the heck of it, IMHO :)
So long as you don't get used to bad practices..

Regarding #pragma pack:

If I was to pack my unorganized Foo class using 2-byte allignment:
#pragma pack(2)
class Foo {
public:
bool A;
int B;
short C;
int D;
short E;
double F;
};
#pragma pack(pop) // return to original alignment


Then the compiler would try to make it look like this in memory ("#" indicates padding):
          (byte)
0 1
(addy) +----------------+
0x0000 | A |########|
0x0002 | B |
0x0004 | B |
0x0006 | C |
0x0008 | D |
0x000A | D |
0x000C | E |
0x000E | F |
0x0010 | F |
0x0012 | F |
0x0014 | F |
+----------------+

this isn't too bad, our class occupies 22 bytes. However, when the class is stored in memory, it actually looks like this (assuming a 32-bit system):
                    (byte)
0 1 2 3
(addy) +--------------------------------+
0x0000 | A |#######| B |
0x0004 | B | C |
0x0008 | D |
0x000C | E | F |
0x0010 | F | F |
0x0014 | F |################|
+--------------------------------+


Because B and F now cross memory boundaries, it's going to be a lot more processor intensive to retrieve those values. I looked it up, and Visual Studio uses 8-byte alignment by default, so without optimizing the structure, our messy class would require 32 bytes of memory, and the optimized structure 24.

I believe the #pragma pack directive is portable, though I havn't done any non-windows development to know. I think gcc supports it.

Share this post


Link to post
Share on other sites
Okay. I'm a little confused now. Think I must be missing something.

Using the pragma directive doesn't actually seem to make much difference to storage in actual memory. It appears that designing from highest to lowest and padding to acheive a full complement of 32 bytes would be best to avoid cache misses.

Like I say, think I must have missed something.

Share this post


Link to post
Share on other sites
Quote:
Original post by garyfletcher
Okay. I'm a little confused now. Think I must be missing something.

Using the pragma directive doesn't actually seem to make much difference to storage in actual memory. It appears that designing from highest to lowest and padding to acheive a full complement of 32 bits would be best to avoid cache misses.

Like I say, think I must have missed something.


I might be the one who's confused. By "cache misses" are you referring to the padding placed between the data members in memory?

The pack directive did make a big difference.. Moving from 4-byte to 2-byte alignment saved us 6 bytes without having to change our structure. Optimizing it only saved us an additional byte.
Of course, there is the performance hit incurred by using it. Moving to 1-byte alignment would pack your class into as small a memory footprint as possible, but I don't recommend it.

Share this post


Link to post
Share on other sites
Yes, that's what I mean by cache misses.

Think I may need to read up about it again just to make sure I'm not talking at cross purposes....my head hurts..;)

From what I've read before it is best practice to align member functions/data so that the total bytes they take up is a multiple of the target byte system (32 in case of most PC's).

I think!!!!

Share this post


Link to post
Share on other sites
To clear up some of the missconceptions.

The compiler inserts padding to help performance you shoulnd't fiddle with it if you don't have a really good reason todo so.

#pragma is inherently unportable since they are implementation defined.

A "cache miss" is when the CPU has to fetch data from main memory (RAM) instead of going via the extremly fast but small on die memory. Therefore you want to promote good locality, i.e. objects often used togheter should be close to eachother in memory.

Small can be said to be somewhere around 64bytes and would advice building an multilayer chunked allocator with 16,32 and 64 byte chunks.

There should be no such thing as a null-reference in a well formed program, creating one is a cardinal sin and will never lead to any good.

Share this post


Link to post
Share on other sites
Quote:
Original post by garyfletcher
Yes, that's what I mean by cache misses.

Think I may need to read up about it again just to make sure I'm not talking at cros purposes....my head hurts..;)


:) np. I had to write a class that allocated memory buffers to contain arrays of structures that were defined at run-time, because the 3rd party API I was working with at the time was seriously messed up. So I got a good crash-course in alignment.

Probably the best thing to do, if you wish to avoid padding, is to simply design your class to fit into 4-byte alignment as closely as possible. Rather than mess with the pack directive.

Share this post


Link to post
Share on other sites
A cache miss doesn't have anything to do with reading unaligned memory.

The CPU has a small chunk of memory where it keeps a copy of the data is reads from RAM. When the data it needs isn't in the cache, then a cache miss happens. The CPU has to stop what it's doing until it can read it from RAM. Similarly, a page fault happens if the data isn't in RAM. Again, your program stops while the OS tries to copy the data into memory from some other source, like your hard disk. If the data still can't be found after that, you get an invalid page fault, and your computer explodes in an impressive display of sparks and smoke. Or maybe it simply crashes, I don't remember which.

Share this post


Link to post
Share on other sites
Surely, if objects aren't aligned to be on a 32 byte memory boundary then there will be more cache misses, ie. they will take more reads from the external storage (hard disk) to process than if they were all nicely aligned?

Maybe I should just throw this book away;)

Share this post


Link to post
Share on other sites
Reading from a hard-drive is another matter.. If you want super-efficient drive access you have to take into account things like sector sizes and drive read caches and stuff.

It's not a subject I know a lot about, but I do know it's very different from reading out of RAM.

Share this post


Link to post
Share on other sites
You need to separate the two concepts of alignment and cacheing.

Alignment issues can cause stalls because of address generation and reading of quantities and depending on the type being read you need slightly diffrent alignment no type on the x86 do require a "harder" alignment constraint than 16bytes (128bits) for optimal read performance.

Cache issues differ slightly between diffrent CPU generations and variants depending on (cache)line size total amount of availble cache memory assosicativity etc.

So for optimal performance you both want to use proper alignment that depends on your datatype but you also need to squeeze as many objects as possible into each cacheline and in that regard tightly packed objects are better.

So alignment holes can lead to bad performance because you miss the cache otoh miss aligned reads always costs you even when the data is in the cache.

But theese issues are most critical when dealing with heavy numbercrunching code like vertex transformations for a sprite class they will hardly matter.

Share this post


Link to post
Share on other sites
Quote:
Original post by pragma Fury
Reading from a hard-drive is another matter.. If you want super-efficient drive access you have to take into account things like sector sizes and drive read caches and stuff.

It's not a subject I know a lot about, but I do know it's very different from reading out of RAM.


Looking into memory mapped files would in that case be the way to go. But I think he's refering to page misses which happen transparently (but often quite noisily as when Windows halts and thrashes the disk for a good minute before getting itself togheter and resuming work).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement