Sign in to follow this  
hiigara

[Linux C++] How can a process simply vanish without core dump?

Recommended Posts

hiigara    108
Hello.

My application is coded in C++ and runs as a daemon (The parent process is 1). In rare occasions (twice in three months) my process simply vanishes.
I have logging messages all over, but the last log message does not give any clue. This means that I probably need to insert more log messages.
ulimit -c is unlimited and I am able to successfully get cores with kill -ABRT. I also got cores during development, when my application had segmentation fault bugs.
My guess is some sort of signal that does not generate a core is being raised. I am blocking SIGPIPE.
What would you do to find out why the application is being terminated? I am going to insert more log messages, but I hope to find the problem before I have a log message for every 2 lines of code.

Share this post


Link to post
Share on other sites
Adam_42    3629
Could it be calling exit() or a similar function somewhere unexpected, say when a specific error is encountered? You should be able to use the --wrap linker switch to intercept calls to it in any code that's linked statically to your program.

It's also possible the [url="http://linux-mm.org/OOM_Killer"]OOM Killer[/url] may have done it.

Share this post


Link to post
Share on other sites
Bregma    9214
[quote name='Adam_42' timestamp='1313589558' post='4850319']
It's also possible the [url="http://linux-mm.org/OOM_Killer"]OOM Killer[/url] may have done it.
[/quote]
My money is on this.

You will need to find out where your distro logs system events (if it does), and review those logs. They are commonly found in /var/log but that really is distro-dependent. You could also try using [b]dmesg[/b] on the command line.

If you get desperate, have your daemon run as a supervisor parent process that does nothing but wait for a forked child (the real daemon) and captures and logs the child's exit conditions. It's only about 20 lines of code and can be left in your daemon to be activated by command-line switch as necessary.

Share this post


Link to post
Share on other sites
hiigara    108
To answer the questions first:
-There is no exit() in my code. The only way for the application to terminate on its own is returning from main. Apart from the system libraries the only library I am using is mysql. So it could be mysql doing something nasty.
-It does not become a zombie. It vanishes from "ps -eF"
-I believe it has permission to write the core in the dir where the executable is located, because it successfully writes it there when I use kill -ABRT

to XTF:
It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.

to Bregma:
That supervisor trick is exactly the kind of solution I was looking for. If it's indeed a signal, waitpid() from the parent will tell me which. I am going to implement that.

I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.

I am using Ubuntu 10.04.3 LTS.
In /var/log I have checked syslog, syslog.1 and syslog.*.gz. Also checked kern.log* and messages*. No mention of OOM.
If OOM kills my process in which log does it show?

Share this post


Link to post
Share on other sites
Bregma    9214
[quote name='hiigara' timestamp='1313764633' post='4851224']
I probably should mention that the way I launch my "daemon" is not very orthodox. I start a putty shell, launch the process in the background, then exit the shell.
The process runs for 2 months launched like this. So I don't think that is the problem.
[/quote]
You could at least try using nohup (man nohup for info) and redirecting output to a log file to eliminate the way you start your program as a cause.

There could also be multiple points of failure, so multiple approaches to tracking down the cause are warranted, especially when it occurs infrequently.

If the problem is the OOM killer, logging in remotely and running 'dmesg' from the command line should tell you something. Unless the system has restarted since the process was killed, which is quite possible under OOM conditions. Verify the system restart time, too.

Share this post


Link to post
Share on other sites
Katie    2244
"It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server."

You don't need to if you run it in a screen session. You can just connect to it when you need to.

Share this post


Link to post
Share on other sites
Vectorian    109
[quote name='hiigara' timestamp='1313764633' post='4851224']
It may not be possible to have a putty session running for 2 months until the program crashes. It's on a remote server.[/quote]




You need to discover the wonder that is screen. [url="http://www.gnu.org/s/screen/"]http://www.gnu.org/s/screen/[/url]
Run 'screen' to start a session, and then use screen -x to attach to it. It will stay running forever in the background. Use ctrl-a c to create a new window, ctrl-a n/p to go to next/previous window, ctrl-a k to kill the current window. Then use ctrl-a ? to find all other goodies. :)

Using this you don't even need to run the process in the background, create a window and run it in it. It'll be just like any command line window, but persistent.


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this