Thursday, August 6, 2009

How to kill defunct processes

If you have been an system administrator for Solaris for sometime you should be familiar with such processes. Basically a defunct or much more known as a zombie process is a process that has completed execution but still has an entry in the process table, this entry being still needed to allow the process that started the zombie process to read its exit status. The term zombie process derives from the common definition of zombie—an undead person. In the term's colorful metaphor, the child process has died but has not yet been reaped.

Whenever I encounter this zombies I normally would kill the parent process that spawned it and much worst if this doesn't work I might have to restart the whole server. Well I've found another option ( befroe you start rebooting the machine).

There's a little less known command to try and kill these defunct/zombie process and that is the command preap.

From the man pages:

NAME
preap - force a defunct process to be reaped by its parent

SYNOPSIS
preap [-F] pid...

DESCRIPTION
A defunct (or zombie) process is one whose exit status has
yet to be reaped by its parent. The exit status is reaped
via the wait(3C), waitid(2), or waitpid(3C) system call. In
the normal course of system operation, zombies may occur,
but are typically short-lived. This may happen if a parent
exits without having reaped the exit status of some or all
of its children. In that case, those children are reparented
to PID 1. See init(1M), which periodically reaps such
processes.

An irresponsible parent process may not exit for a very long
time and thus leave zombies on the system. Since the operat-
ing system destroys nearly all components of a process
before it becomes defunct, such defunct processes do not
normally impact system operation. However, they do consume a
small amount of system memory.

preap forces the parent of the process specified by pid to
waitid(3C) for pid, if pid represents a defunct process.

preap will attempt to prevent the administrator from
unwisely reaping a child process which might soon be reaped
by the parent, if:

o The process is a child of init(1M).

o The parent process is stopped and might wait on the
child when it is again allowed to run.

o The process has been defunct for less than one minute.

So to kill a defunct process you can try:

server# ps -ef| grep -i defunct
oracle 23650 22802 0 - ? 0:01
oracle 23657 22802 0 - ? 0:01
oracle 23580 22802 0 - ? 0:00
oracle 23924 16560 0 - ? 0:00
oracle 23750 22802 0 - ? 0:01
oracle 23928 16363 0 - ? 0:00
oracle 23915 17114 0 - ? 0:00
oracle 23940 20910 0 - ? 0:00
oracle 23863 21896 0 - ? 0:00


server# ps -ef| grep -i defunct |awk {'print $2'}|xargs preap
23650: killed by signal KILL
23657: killed by signal KILL
23580: killed by signal KILL
23924: exited with status 141
23750: killed by signal KILL
23928: exited with status 141
23915: exited with status 141
23940: killed by signal KILL
23863: killed by signal KILL
23921: exited with status 141
23912: exited with status 141
23652: killed by signal KILL
23889: exited with status 141
23752: killed by signal KILL
23931: exited with status 141
23925: exited with status 141
23936: exited with status 141
23916: exited with status 141
23784: killed by signal KILL
23744: killed by signal KILL
23656: killed by signal KILL
preap: cannot examine 6343: no such process
23926: exited with status 141
23651: killed by signal KILL
23631: killed by signal KILL
23922: exited with status 141
23654: killed by signal KILL
23781: killed by signal KILL
23933: exited with status 141
23923: exited with status 141
23790: killed by signal KILL
24011: exited with status 141
23938: exited with status 141
23634: killed by signal KILL
23907: exited with status 141
23864: exited with status 141
23908: exited with status 141
23883: exited with status 141
23812: killed by signal KILL
23765: killed by signal KILL
23906: exited with status 141
23910: exited with status 141
23871: exited with status 141
23653: killed by signal KILL
23902: killed by signal KILL
23782: killed by signal KILL
23743: killed by signal KILL
server# ps -ef| grep -i defunct

If this still fails to kill them then you have to restart the mother process or the server itself.

Readers who read this page, also read:




Bookmark and Share My Zimbio http://www.wikio.com

0 comments: