Crashes and problems with multi-threaded Lua host (C++)

Started by
2 comments, last by AverageJoeSSU 14 years ago
Hi Me and some friends are programming a server for the game Minecraft that has a Lua plugins system, styled after GMod Lua. We're coding it in C++ It even has hooks. We've hit some problems though in the past 4 days that we can't seem to fix. Before I show you any code, allow me to briefly explain how it works. There is the Main thread, and then a thread for each client connected to it. The Lua implementation loads several independent "plugins" in the same Lua instance, but isolated in different environments. Anyway, when a client thread receives a packet (e.g. a chat message), it calls the function in the PluginManager class called CallPlayerHook(const char *hook, Player *playerObj). Here's what that function does: Here's the pastebin for easier reading
bool PluginManager::CallPlayerHook(const char *hook, Player *playerObj) {
	WaitForSingleObject(this->pluginsMutex, INFINITE);

	if(this->isDoingHook)
		return false;
	else
		this->isDoingHook = true;

	if(lua_gettop(Lua) > 0) {
		Console::PrintText("Stack is > 0 for hook '%s'", hook);
		return false;
	}

	if(lua_gettop(Lua)) {
		Console::PrintText("Waiting for Lua: %s", hook);
		while(lua_gettop(Lua));
		Console::PrintText("Done waiting for Lua: %s", hook);
	}


	std::list<Plugin*>::iterator iter;

	for(iter = this->plugins.begin(); iter != this->plugins.end(); ++iter) {
		if((*iter)->CallPlayerHook(hook, playerObj) == false) {
			//lua_settop(Lua, 0);
			this->isDoingHook = false;
			ReleaseMutex(this->pluginsMutex);
			return false;
		}
	}

	this->isDoingHook = false;

	ReleaseMutex(this->pluginsMutex);

	return true;
}
The code is quite cluttered with several attempts to fix the numerous problems, but none of them work. The function returns a bool. If it was true, everything was OK. Otherise, it returns false. The client thread that called it does it like: while(!pluginManager->CallPlayerHook("OnPlayerChat", this, NULL, NULL)); So what PluginManager::CallPlayerThread() does is loop through a std::list of Plugin objects, and calls a similar function in there. So let's look at Plugin::CallPlayerHook(const char *hook, Player *playerObj). What this function does it loop through a local std::list of structs containing char*'s of a hooked event, and the function name to be called in that plugin's Lua script. Just like GMod's hooks system. If if finds it, it checks if the Lua stack is empty (no functions are running), and then it does Lua stuff to push the function and Player object onto the stack and then pcalls it. The code is very cluttered with try{}catch(){}'s so that I could pinpoint the error. Here's the code, and then I will describe exactly what the problem it. Here's the pastebin for easier reading
bool Plugin::CallPlayerHook(const char *hook, Player *playerObj) {
	std::list<Hook*>::iterator iter;
	try {
		for(iter = Hooks.begin(); iter != Hooks.end(); ++iter) {
			if(strcmp((*iter)->eventName, hook))
				continue;
			if(lua_gettop(Lua))
				return false;
			try {
				lua_getfield(Lua, LUA_GLOBALSINDEX, "PLUGINS");
				lua_pushnumber(Lua, this->id);
				lua_gettable(Lua, -2);
				lua_getfield(Lua, -1, (*iter)->functionName);
			} catch(...) {
				Console::PrintText("Exception with getting function for hook '%s' and plugin '%s'", hook, this->Name);
				lua_settop(Lua, 0);
				return false;
			}
			try {
				this->SetEnv();
			} catch(...) {
				Console::PrintText("Exception with SetEnv() for hook '%s' and plugin '%s'", hook, this->Name);
				lua_settop(Lua, 0);
				return false;
			}
			try {
				if (playerObj != NULL) {
					tolua_pushusertype(Lua, playerObj, "Player");
				} else {
					Console::PrintText("Player is NULL for hook '%s' and plugin '%s'", hook, this->Name);
					lua_settop(Lua, 0);	
					return false;
				}
			} catch(...) {
				Console::PrintText("Exception with pushing params for hook '%s' and plugin '%s'", hook, this->Name);
				lua_settop(Lua, 0);
				return false;
			}
			try {
				if(lua_pcall(Lua, 1, 0, 0)) {
					Console::PrintText("LUA ERROR: %s", lua_tostring(Lua, -1));
					lua_pop(Lua, 1);
				}
			} catch(...) {
				Console::PrintText("Exception with hook pcall for hook '%s' and plugin '%s'", hook, this->Name);
				lua_settop(Lua, 0);
				return false;
			}
			lua_settop(Lua, 0);
		}
	} catch(...) {
		Console::PrintText("CallHook failed at '%s'. Retrying.", hook);
		lua_settop(Lua, 0);
		return false;
	}
	return true;
}
So finally, here's what happens. When one person joins the server, the hooks work fine for the most part. It calls the hooks for OnPlayerJoin, OnPlayerChat, OnPlayerMove, etc. The Lua code is executed, and the plugins function as expected. However, randomly a few times a minute, one of the messages in the catch(){} statements above displays. It's always a random one. Usually it's around pcall, but sometimes it's a problem with pushing params. Now the really bad part is when a second (or even third) client joins. Sometimes, the clients will freeze up, and no packets will go through. Other times it'll work for a while (while displaying lots of random caught exceptions), and then it will crash due to an "uncaught exception: longjump executed", even though the point of the exception was IN a try{} statement! Other times, the situation will be the same as the previous stated one, but instead of a longjump exception, the server will just freeze up spamming a Lua-generated error: "C STACK OVERFLOW", or "Tried to call table object!", or even that an object is nil (which means the right things weren't pushed onto the stack from C++ when they should have). It's really confusing us, and it all seems so random as it's very different most of the times. Could it be caused by thread conflicts? Maybe two client threads are trying to call a lua hook at the same time, or maybe one is trying to call it while another hook is currently procesing? Could it be a Mutex problem? I would like to thank you for just reading this far and thinking about it. This problem is very troubling, and the project is near completion once we fix this. We assume that all of these problems are probably caused by one simple programming mistake or two. This is a major road-block, and a potential show-stopper for our project, even though we are so close to competion. Thank you for your time, Drew
Advertisement
This probably has nothing to do with it, but in Plugin::CallPlayerHook:

1) The check for playerObj being NULL should be at the top of the function (or if it should never be NULL, pass it by reference). You should also check if hook is NULL, just in case.

2) If it can't find a Hook whose eventName is the same as hook, the function returns true. Is that how it's supposed to work?
Because that particular plugin may not have hooked a function to a certain event. For instance, if the plugin just tells the player a random number if they type !rand in-game, it probably won't have any code it needs to run every time the player moves, only when they chat. Hooks are optional.
smells like a race condition to me.

did you try doing this without threading first? Just as a sanity check that you are sure it is in the threading.

Also, im guessing you are sure (or as sure as you can be) that it is not networking issues.

a good rule of thumb is to identify when something is locked and when something is trying to be accessed... this may be non-trivial if you are accessing something from within lua.

I am not entirely familiar with lua and multithreading besides using coroutines, although if you have have a state that is trying to access data from multiple threads, this will most likely break.

------------------------------

redwoodpixel.com

This topic is closed to new replies.

Advertisement