Server problem.

Started by
8 comments, last by Raduprv 20 years, 9 months ago
I have a question... Sometimes, my MMORPG server machine (which runs RH 8.1) becomes VERY, VERY slow, doesn''t accept any connections, and, if it responds to ping, the time is huge, like 2 seconds or so. Also, at that time, it seems that all the bandwidth (up and down) is taken. However, after a reboot, it works fine. Also, even if you don''t reboot it, it will recover, but slowly (10 minutes or so), and it might do that again. The machine is an Athlon 900, 192 RAM. The connection is 256 kbps up, 1mb down... The only think I can think of is that the server is Syn Flooded. But can a Syn Flood eat all the CPU? BTW, when things go OK, the machine load is generaly under 2% P.S. The machine is not mine, but I have root access to it. Do you have any sugestions, on how can I investigate WTF happens? Height Map Editor | Eternal Lands | Fast User Directory
Advertisement
Aren''t there just a lot of player connected to your MMOrpg? And you''re sure your engine doens''t have any memory-leaks or so?
Newbie programmers think programming is hard.Amature programmers think programming is easy.Professional programmers know programming is hard.
run "top" and see what''s eating the CPU during a slow period.

How appropriate. You fight like a cow.
I used to see a bug like this from time to time with RH7.3. I never could isolate it, though admittedly, once I found it I switched distros vs isolate it. I haven''t seen it recently with RH9, but to be honest, I rarely use RH anymore beyond some client testing at work.

Int
Do you have an IDE hard disk? and findutils (locate, etc) installed? it could be the updatedb cronjob... in my SuSE it runs every 12h... do a ps -aux, and if you see something like find, recode and updatedb, it''s this... you can uninstall findutils...
My MMORPG server is not the cause, since right now it is in pre alpha, and there are few users (the max were 20). The server is very friendly, it eats only like 20 CP seconds/day, and i am resonably sure that''s not the problem.

Anyway, I can''t do a top, or anything when that happens, because, whenever it happens, I can''t ssh to that machine (or do anything else with it). Besides, I am not always there when it hapens.
Is there a way to see who/what is responsible for that even when I am not aware of this problem? I was thinking of something that would write some things in the logs. But that program should somehow detect when this problem happens, and start loging data only at that time, not 24/7

Height Map Editor | Eternal Lands | Fast User Directory
quote:Original post by roka-tarat
Do you have an IDE hard disk? and findutils (locate, etc) installed? it could be the updatedb cronjob... in my SuSE it runs every 12h... do a ps -aux, and if you see something like find, recode and updatedb, it''s this... you can uninstall findutils...


Yes, I have an IDE HDD, but even so, that updatedb thing, on my machine (my test machine, not the server) usually kicks in at random times, but it doesn''t bother me too much. That is, it really starts to play with the HDDs, but it won''t stop other applications, block the ports, etc.

Height Map Editor | Eternal Lands | Fast User Directory
quote:Original post by Raduprv
Anyway, I can''t do a top, or anything when that happens, because, whenever it happens, I can''t ssh to that machine (or do anything else with it).
Have an SSH session open running top at all times.

quote:Is there a way to see who/what is responsible for that even when I am not aware of this problem? I was thinking of something that would write some things in the logs.
Various options to top will make it non-interactive and automatic. Check the manpage for more details.

quote:But that program should somehow detect when this problem happens, and start loging data only at that time, not 24/7

That''d be tricky. HD space is hardly at a premium these days, tho; you could easily afford a log every 15 seconds or so.

How appropriate. You fight like a cow.
Usually my ssh connection (I use Putty) dies after like 30-60 minutes of inactivity. I don''t know why. So, this can''t be used to check what''s wrong.

So, do you think I should put tcpdump in crontab, to log every 15 seconds or so?

Height Map Editor | Eternal Lands | Fast User Directory
Offhand, it sounds like the machine is swapping out, i.e. some process is using more memory than is available...

Do you get any messages in your syslog? If the machine recovers and it's one of the system processes you should get a log message about what happened.

If KDE or any window managers is running, I would make sure to disable them as well...

It's possible that it's an external attack, and the easiest way to determine if that's the case would be to see if your bandwidth usage goes up significantly during these periods. You should also see something in the logs about what type of connection was attempted in this case. Don't forget to check mail to root, and apache logs if you are running a web server.

I'm not terribly familiar with RH, so these are just generalizations, it well could be something specific to Red Hat. If that's the case a search on groups.google.com should turn something up...

As for TCPDump, I think I would probably first try "free" and "top" at the intervals you suggested, and maybe "ps auwx" for posterity...

The Putty issue sounds like an idle time out. I think putty has a setting to send keep-alives at regular intervals, and it should also be possible to disable this in your sshd.conf.

Hope some of that helps...

-- Aaron

| HollowWorks.com |

[edited by - mrhollow on July 10, 2003 6:05:52 PM]

This topic is closed to new replies.

Advertisement