Sign in to follow this  
mychii

My Custom Memory Pool Seems Too Slow. Need Advice.

Recommended Posts

mychii    898

Hi guys,

 

So I'm warming up myself with C++ again, but my C++ isn't so much before anyway. As I am back, I try to get deeper in memory management.

 

I am currently trying to create my own generic memory pool for any object that inherits a parent class (I call it Object, which has a static memory pool attached to it). I need this memory pool so I could manually manage any child class that inherits Object.h at any time I, or the pool desire (I guess it's a garbage collector in the end; it is simply a pool).

So every time a class that inherits Object, it will have its new and delete overridden. The "new" will have a normal manual memory allocation, and "delete" simply stores the memory back to the pool in case it is needed again/not going back to free store, unless/until I flush them all either manually/automatically depends on the application needs.

Since children classes are always different, I make the memory pool to be generic per child class, therefore I use STL map. I do this to avoid memory corruption as every child class should have different size of memory required. This causes every memory allocation to check which class to find.

 

The memory pool works, but the performance is severe. I don't know whether it is the architecture, or STL overuse, or the code is too blunt, but I really need someone to give feedback on it. The performance is like 9x slower compare to default "new" when I test it with 100K iteration (1.8 sec compare to 0.2 sec) when allocating the memory. I even tried another test by filling the pool first before the test but it gets even slower! And I don't even include tracker for the pointers yet.

 

The thing is, I don't know if this is normal or not. But having it 9 times slower is somehow unacceptable for me. So again, please have a check if I miss something, maybe it's unsafe, the design sucks, or anything you find odd (it contains leaks, corruption, etc.). I need guidance. rolleyes.gif

 

Soo here's the code:

 

MemoryPool.h -> The memory pool!

/**
 * Generic stack-based memory pool.
 */

#ifndef MEMORY_POOL_H
#define MEMORY_POOL_H

#include <typeinfo>
#include <vector>
#include <map>
#include <string>

#include "Common.h"

namespace Engine
{
	using std::vector;
	using std::map;
	using std::string;

	template<typename T>
	class MemoryPool
	{
	public:
		MemoryPool();
		~MemoryPool();
		void* alloc(size_t size);
		void dealloc(void* ptr);

	private:
		map<string, vector<char*>> _pool = map<string, vector<char*>>();
	};

	template<typename T>
	MemoryPool<T>::MemoryPool()
	{
	}

	template<typename T>
	MemoryPool<T>::~MemoryPool()
	{
	}

	template<typename T>
	void* MemoryPool<T>::alloc(size_t size)
	{
		char* ptr;
		vector<char*> specificPool;
		std::map<string, vector<char*>>::const_iterator searcher = _pool.find(typeid(T).name());

		if (searcher != _pool.end())
		{
			specificPool = searcher->second;
		}
		else
		{
			_pool.insert(std::pair<string, vector<char*>>(typeid(T).name(), vector<char*>()));
			specificPool = _pool[typeid(T).name()];
		}

		// If there are idle pointers, then reuse one.
		if (specificPool.size() > 0)
		{
			ptr = specificPool.back();

			specificPool.pop_back();

			return ptr;
		}

		// If there is nothing to reuse, then allocate new one.
		ptr = new char[size];

		if (ptr != NULL)
		{
			return ptr;
		}

		// Return nullptr when no memory left.
		return nullptr;
	}

	template<typename T>
	void MemoryPool<T>::dealloc(void* ptr)
	{
		std::map<string, vector<char*>>::const_iterator searcher = _pool.find(typeid(T).name());

		if (searcher != _pool.end())
		{
			_pool[typeid(T).name()].push_back((char*)ptr);
		}
		else
		{
			cout << "ERROR: No pointer pointing at such object to deallocate!" << endl;
		}
	}
}

#endif

Object.h -> The parent class!

#ifndef OBJECT_H
#define OBJECT_H

#include <string>
#include <vector>

#include "MemoryPool.h"

namespace Engine
{
	using std::string;
	using std::vector;

	template<typename T>
	class Object
	{
	public:
		static MemoryPool<T> memoryPool;

	public:
		Object(string name);
		~Object();
		string& getName();
		void* operator new(size_t size);
		void operator delete(void* ptr);

		/**
		* Reset the object. Always put all cleaners here instead of destructors!
		*/
		virtual void reset() = 0;

	private:
		string _name;
	};

	template <typename T>
	MemoryPool<T> Object<T>::memoryPool = MemoryPool<T>();

	template <typename T>
	Object<T>::Object(string name) : _name(name)
	{
	}

	template <typename T>
	Object<T>::~Object()
	{
		_name.clear();
	}

	template <typename T>
	void* Object<T>::operator new(size_t size)
	{
		return memoryPool.alloc(size);
	}

	template <typename T>
	void Object<T>::operator delete(void* ptr)
	{
		memoryPool.dealloc(ptr);
	}

	template <typename T>
	string& Object<T>::getName()
	{
		return _name;
	}
}

#endif

Entity.h -> Sample usage of a child class that inherits Object.

#ifndef ENTITY_H
#define ENTITY_H

#include <vector>

#include "Common.h"
#include "Object.h"
#include "Component.h"

namespace Engine
{
	using std::vector;

	class Entity : public Object<Entity>
	{
	public:
		vector<Component*> components;

		Entity(string name);
		~Entity();
		void clearAllComponents();
		void reset();
	};
}

#endif

And here's how I benchmark them (without filling the pool first):

#include <iostream>
#include <ctime>

#include "Engine\Core\Entity.h"

using std::cout;
using std::endl;

using Engine::Entity;

/* Normal class */
class Car
{
public:
	Car() {}
	~Car() {}
	void getName() {}
};

int main()
{
	float startTime2 = (float)clock() / CLOCKS_PER_SEC;
	for (int iii = 0; iii < 100000; ++iii)
	{
		Entity* o = new Entity("Entity");
	}
	cout << (float)clock() / CLOCKS_PER_SEC - startTime2 << endl;
	

	float startTime = (float)clock() / CLOCKS_PER_SEC;
	for (int i = 0; i < 100000; ++i)
	{
		Car* car = new Car();
	}
	cout << (float)clock() / CLOCKS_PER_SEC - startTime << endl;

	return 0;
}

Note that there's Common.h header file. It contains only standard includes like cout and endl for debugging purposes, nothing else.

 

Thank you guys! wub.png

Edited by mychii

Share this post


Link to post
Share on other sites
ApochPiQ    22999

In your memory pool's alloc function, specificPool is copied by value from the map on each allocation. I suspect this is a bug and what you really want is to have specificPool be a pointer to a vector in the map. Is that correct?

Share this post


Link to post
Share on other sites
Waterlimon    4398

Do I understand this correctly:

-Entity is Object<Entity>

-Instances of Object<Entity> have MemoryPool<Entity>

-Only objects of type Entity can be allocated into that specific pool

 

I dont see how the map works here, wont every instantiation of MemoryPool only allow a single type to be allocated anyways, which would mean that there will only every be a single key in the map? Im not very used to template code and static members so i might have misundestood something.

Anyways, I do have suggestions, assuming there is nothing fundamentally wrong in how the pool works:

-Would it be possible to use the hash_code of the type_info struct instead of the name for the map key? This way you wouldnt need to use std::string but could have an int instead which might be faster.

-Reserve some capacity into the vectors when you create them so they dont need to reallocate all the time

-You could try using an unorderer_set instead of a map, access is about O(1) I think. If the map access is a bottleneck.

-In your test code, make sure that the objects you compare are actually the same. In the code you posted, car is probably a single byte, while your Entity might even allocate memory because of the vector contained.

 

And of course, use a profiler to figure out what it is you are doing that is slowing down your pool.

 

 

EDIT:

You might want to consider a less intrusive option. Your classes probably should not know where and how they are allocated.

Edited by Waterlimon

Share this post


Link to post
Share on other sites
mychii    898

In your memory pool's alloc function, specificPool is copied by value from the map on each allocation. I suspect this is a bug and what you really want is to have specificPool be a pointer to a vector in the map. Is that correct?

 

I guess you're right, I can optimize that part. I'll let you know. EDIT: Tried it! it reduces by 0.3 sec! Thanks. (happy.png )

 

 

Do I understand this correctly:

-Entity is Object<Entity>

-Instances of Object<Entity> have MemoryPool<Entity>

-Only objects of type Entity can be allocated into that specific pool

 

I dont see how the map works here, wont every instantiation of MemoryPool only allow a single type to be allocated anyways, which would mean that there will only every be a single key in the map? Im not very used to template code and static members so i might have misundestood something.

Anyways, I do have suggestions, assuming there is nothing fundamentally wrong in how the pool works:

-Would it be possible to use the hash_code of the type_info struct instead of the name for the map key? This way you wouldnt need to use std::string but could have an int instead which might be faster.

-Reserve some capacity into the vectors when you create them so they dont need to reallocate all the time

-You could try using an unorderer_set instead of a map, access is about O(1) I think. If the map access is a bottleneck.

-In your test code, make sure that the objects you compare are actually the same. In the code you posted, car is probably a single byte, while your Entity might even allocate memory because of the vector contained.

 

And of course, use a profiler to figure out what it is you are doing that is slowing down your pool.

 

The idea of the map is to avoid that specific situation. Say I have A, B, C class that inherits Object, I assume that Object would be generic on each of them, like Object<A>, Object<B>, Object<C>. So the map contains vectors of pointers for each A, B, and C when required. Say we're allocating memory for A, if the map is not yet have vectors of pointers for A objects, then create such vector. Then A would have its own pool. Whenever Pool for A is filled, when allocating new As, it will automatically reuse the pointers from the vector inside map, which has they key for A.

 

1. Hmm I haven't check on hash_code yet, maybe it would. Yes! probably string is the case.

2. Will try!

3. I'll try that, yeah probably the map. Thanks for pointing it out!

4. You're right. I was probably got terrified already with the number. I'll post out with comparable items. Still,1.8 sec for such test is somehow unexpected, but that's why I open this thread to make sure if it's okay or not okay.

 

EDIT: Using all the first 3 suggestions (including ApochPiQ suggestion), it drops to 0.6 seconds for 100K iterations of allocating the Entity! But I used unordered_map here instead of unordered_set, because I am unable to track which object to pool without a key. I'll find a way for that. And also I hardcoded the capacity reserve for vector according to the test. I have to find out on how this should work in real application. Anyway, without reserving the capacity, I will get 0.95x seconds.

 

I am so not knowing how to use this IDE (Visual Studio 2013), is it related to that? do you mind telling me about this profiler stuff?

Edited by mychii

Share this post


Link to post
Share on other sites
Waterlimon    4398

I have used the Very Sleepy profiler (not part of visual studio) to do my profiling. I dont think visual studio express edition has a profiler.

 

I understand what you intend to do with the map, but to me it seems as if you would have 3 maps: 1 for A, 1 for B, 1 for C, because the MemoryPool class itself is templated. (so you have MemoryPool<A> with its map, MemoryPool<B> with another map etc.)

 

Looking at the code it seems as if the MemoryPool class itself does not need to be a template. Only some of the functions (the allocation ones) need to be. This would probably fix it.

Edited by Waterlimon

Share this post


Link to post
Share on other sites
mychii    898
 

I have used the Very Sleepy profiler (not part of visual studio) to do my profiling. I dont think visual studio express edition has a profiler.

 

I understand what you intend to do with the map, but to me it seems as if you would have 3 maps: 1 for A, 1 for B, 1 for C, because the MemoryPool class itself is templated. (so you have MemoryPool<A> with its map, MemoryPool<B> with another map etc.)

 

Looking at the code it seems as if the MemoryPool class itself does not need to be a template. Only some of the functions (the allocation ones) need to be. This would probably fix it.

 

I'll check on that profiler, thanks!

 

That's the funny thing though, the memoryPool in Object is static, so I am assuming it is only one. What I thought is (here's the dangerous part), the memoryPool attached to Object would adapt to whichever who wants to use it (A, B, or C). So again, since memoryPool is only one as static, I assume that the map in memoryPool is shared and dynamically adapt whoever is using it whether it's A, B, or C. But I can't really tell if this is correct or not (anyone please clarify).

 

Hmm I thought about that too, I'll give in another shot.

Edited by mychii

Share this post


Link to post
Share on other sites
mychii    898
 

EDIT:

You might want to consider a less intrusive option. Your classes probably should not know where and how they are allocated.

 

Any idea of the usage? I don't seem to understand the less intrusive option should be. Is it because it has to inherit Object with template?

 

 

The key issue is that MemoryPool<Foo> is a different type than MemoryPool<Bar>. This means that static members are not shared between the two different pools.

 

Oh my I didn't know that at all. Well that clears up what Waterlimon suggested then, that memoryPool probably won't need to be a template. laugh.png

 

I guess I have to learn more on how this template works. Thanks ApochPiQ! biggrin.png

Edited by mychii

Share this post


Link to post
Share on other sites
mychii    898

After the clarification about the template issue, I tried to remove the map and decided to use the template instead. It removes the needs to find which pool I have to allocate (due to map), which increases the performance slightly (up to 0.1 sec) compare to the one that uses map and without template. If I reserve the capacity (by the total number of the test), it is reduced to 0.23 sec! The number of classes that would need this would probably at hundreds or less, unless doing such template that way is very expensive (I don't know).

 

EDIT: As Waterlimon suggested, I've changed the benchmark against the Car class to be quite a bit the same as Entity. The Car class now inherits Vehicle, where it has to pass a string through its constructor and save it to a name variable. Apparently this is the culprit. Now the Car increases up to 0.5x seconds!

Now everything looks quite normal, as a matter of fact, better!

 

Thanks a bunch guys!! you guys are very helpful. laugh.png

Edited by mychii

Share this post


Link to post
Share on other sites
swiftcoder    18426


The Car class now inherits Vehicle, where it has to pass a string through its constructor and save it to a name variable. Apparently this is the culprit.

Are you passing those strings by const reference, or copying them at every step?

 

Failing to pass complex objects (like string, vector, etc) by const reference will result in a lot of unnecessary copies, which tends to put a serious damper on performance.

Share this post


Link to post
Share on other sites
Waterlimon    4398

It would make more sense if you just used the same class for comparing the performance of the allocation method (make a version of the entity class without the Object base class). You only want to compare the allocation method, not the particular properties of the two different classes.

 

Additionally, I would argue that your test is not very fair to the pool allocator, because you only test allocation, not simultaneous allocation and deallocation, which might be a more realistic situation.

Share this post


Link to post
Share on other sites
rip-off    10976
It isn't clear what problem you're trying to solve. Is allocation currently a bottleneck? Have you investigated trying to avoid heap allocation, rather than trying to speed it up?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this