Back to General and Gameplay Programming

[Linux C++] How can a process simply vanish without core dump?

General and Gameplay Programming Programming

Started by hiigara August 16, 2011 05:49 PM

8 comments, last by Vectorian 12 years, 8 months ago

hiigara

108

Author

August 16, 2011 05:49 PM

Hello.

My application is coded in C++ and runs as a daemon (The parent process is 1). In rare occasions (twice in three months) my process simply vanishes.
I have logging messages all over, but the last log message does not give any clue. This means that I probably need to insert more log messages.
ulimit -c is unlimited and I am able to successfully get cores with kill -ABRT. I also got cores during development, when my application had segmentation fault bugs.
My guess is some sort of signal that does not generate a core is being raised. I am blocking SIGPIPE.
What would you do to find out why the application is being terminated? I am going to insert more log messages, but I hope to find the problem before I have a log message for every 2 lines of code.

Adam_42

3,664

August 17, 2011 01:59 PM

Could it be calling exit() or a similar function somewhere unexpected, say when a specific error is encountered? You should be able to use the --wrap linker switch to intercept calls to it in any code that's linked statically to your program.

It's also possible the OOM Killer may have done it.

Dragonion

131

August 17, 2011 02:40 PM

Does the process also vanish from the process list, or does it turn into a zombie-process?

Katie

2,255

August 17, 2011 03:21 PM

Does it have write permission in the dir it's trying to core dump into?

XTF

100

August 17, 2011 04:19 PM

Check syslog
Run the daemon in screen
Use gdb or strace

Bregma

9,461

August 17, 2011 05:14 PM

It's also possible the OOM Killer may have done it.

My money is on this.

You will need to find out where your distro logs system events (if it does), and review those logs. They are commonly found in /var/log but that really is distro-dependent. You could also try using dmesg on the command line.

If you get desperate, have your daemon run as a supervisor parent process that does nothing but wait for a forked child (the real daemon) and captures and logs the child's exit conditions. It's only about 20 lines of code and can be left in your daemon to be activated by command-line switch as necessary.

Stephen M. Webb
Professional Free Software Developer

hiigara

108

Author

August 19, 2011 02:37 PM

To answer the questions first:
-There is no exit() in my code. The only way for the application to terminate on its own is returning from main. Apart from the system libraries the only library I am using is mysql. So it could be mysql doing something nasty.
-It does not become a zombie. It vanishes from "ps -eF"
-I believe it has permission to write the core in the dir where the executable is located, because it successfully writes it there when I use kill -ABRT

to XTF:
It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.

to Bregma:
That supervisor trick is exactly the kind of solution I was looking for. If it's indeed a signal, waitpid() from the parent will tell me which. I am going to implement that.

I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.

I am using Ubuntu 10.04.3 LTS.
In /var/log I have checked syslog, syslog.1 and syslog.*.gz. Also checked kern.log* and messages*. No mention of OOM.
If OOM kills my process in which log does it show?

Bregma

9,461

August 19, 2011 04:03 PM

I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.

You could at least try using nohup (man nohup for info) and redirecting output to a log file to eliminate the way you start your program as a cause.

There could also be multiple points of failure, so multiple approaches to tracking down the cause are warranted, especially when it occurs infrequently.

If the problem is the OOM killer, logging in remotely and running 'dmesg' from the command line should tell you something. Unless the system has restarted since the process was killed, which is quite possible under OOM conditions. Verify the system restart time, too.

Stephen M. Webb
Professional Free Software Developer

Katie

2,255

August 19, 2011 07:10 PM

"It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server."

You don't need to if you run it in a screen session. You can just connect to it when you need to.

Vectorian

109

August 20, 2011 05:41 PM

It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.

You need to discover the wonder that is screen. http://www.gnu.org/s/screen/
Run 'screen' to start a session, and then use screen -x to attach to it. It will stay running forever in the background. Use ctrl-a c to create a new window, ctrl-a n/p to go to next/previous window, ctrl-a k to kill the current window. Then use ctrl-a ? to find all other goodies.

Using this you don't even need to run the process in the background, create a window and run it in it. It'll be just like any command line window, but persistent.

[Linux C++] How can a process simply vanish without core dump?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

[Linux C++] How can a process simply vanish without core dump?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines