[Linux C++] How can a process simply vanish without core dump?

Started by
8 comments, last by Vectorian 12 years, 8 months ago
Hello.

My application is coded in C++ and runs as a daemon (The parent process is 1). In rare occasions (twice in three months) my process simply vanishes.
I have logging messages all over, but the last log message does not give any clue. This means that I probably need to insert more log messages.
ulimit -c is unlimited and I am able to successfully get cores with kill -ABRT. I also got cores during development, when my application had segmentation fault bugs.
My guess is some sort of signal that does not generate a core is being raised. I am blocking SIGPIPE.
What would you do to find out why the application is being terminated? I am going to insert more log messages, but I hope to find the problem before I have a log message for every 2 lines of code.
Advertisement
Could it be calling exit() or a similar function somewhere unexpected, say when a specific error is encountered? You should be able to use the --wrap linker switch to intercept calls to it in any code that's linked statically to your program.

It's also possible the OOM Killer may have done it.
Does the process also vanish from the process list, or does it turn into a zombie-process?
Does it have write permission in the dir it's trying to core dump into?
Check syslog
Run the daemon in screen
Use gdb or strace

It's also possible the OOM Killer may have done it.

My money is on this.

You will need to find out where your distro logs system events (if it does), and review those logs. They are commonly found in /var/log but that really is distro-dependent. You could also try using dmesg on the command line.

If you get desperate, have your daemon run as a supervisor parent process that does nothing but wait for a forked child (the real daemon) and captures and logs the child's exit conditions. It's only about 20 lines of code and can be left in your daemon to be activated by command-line switch as necessary.

Stephen M. Webb
Professional Free Software Developer

To answer the questions first:
-There is no exit() in my code. The only way for the application to terminate on its own is returning from main. Apart from the system libraries the only library I am using is mysql. So it could be mysql doing something nasty.
-It does not become a zombie. It vanishes from "ps -eF"
-I believe it has permission to write the core in the dir where the executable is located, because it successfully writes it there when I use kill -ABRT

to XTF:
It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.

to Bregma:
That supervisor trick is exactly the kind of solution I was looking for. If it's indeed a signal, waitpid() from the parent will tell me which. I am going to implement that.

I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.

I am using Ubuntu 10.04.3 LTS.
In /var/log I have checked syslog, syslog.1 and syslog.*.gz. Also checked kern.log* and messages*. No mention of OOM.
If OOM kills my process in which log does it show?

I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.

You could at least try using nohup (man nohup for info) and redirecting output to a log file to eliminate the way you start your program as a cause.

There could also be multiple points of failure, so multiple approaches to tracking down the cause are warranted, especially when it occurs infrequently.

If the problem is the OOM killer, logging in remotely and running 'dmesg' from the command line should tell you something. Unless the system has restarted since the process was killed, which is quite possible under OOM conditions. Verify the system restart time, too.

Stephen M. Webb
Professional Free Software Developer

"It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server."

You don't need to if you run it in a screen session. You can just connect to it when you need to.

It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.





You need to discover the wonder that is screen. http://www.gnu.org/s/screen/
Run 'screen' to start a session, and then use screen -x to attach to it. It will stay running forever in the background. Use ctrl-a c to create a new window, ctrl-a n/p to go to next/previous window, ctrl-a k to kill the current window. Then use ctrl-a ? to find all other goodies. :)

Using this you don't even need to run the process in the background, create a window and run it in it. It'll be just like any command line window, but persistent.


This topic is closed to new replies.

Advertisement