
在 Linux 中,子进程调用 exit() 后,内核会保留其 PID、退出码等信息,直到父进程通过 wait() 或 waitpid() 取走。
如果父进程迟迟不取,子进程就处于 Z (zombie/defunct) 状态,俗称“僵尸进程”。
僵尸不耗内存,但占用进程表项;大量堆积会导致 fork: Resource temporarily unavailable。
fork()/multiprocessing.Process.start()/subprocess.Popen() 创建子进程。wait/waitpid 或 未注册SIGCHLD 处理器。方案 | 代码示例 | 适用场景 |
|---|---|---|
1. 手动 join | p = Process(...); p.start(); p.join() | 少量子进程 |
2. with ProcessPoolExecutor | with ProcessPoolExecutor() as pool: pool.map(...) | 并行任务 |
3. SIG_IGN | signal.signal(signal.SIGCHLD, signal.SIG_IGN) | Unix 常驻服务 |
# oom_zombie.py
import multiprocessing as mp
def leak():
_ = [bytearray(1024*1024) for _ in range(10000)]
if __name__ == "__main__":
mp.Process(target=leak).start() # 不 join
input("press enter to quit...")
运行:
systemd-run --scope -p MemoryMax=100M python oom_zombie.py
再开终端:
watch -n1 'ps -eo pid,ppid,state,comm | grep python'
会看到 <defunct> 僵尸。
9530 9529 Z python3 <defunct>
# oom_with_ctx.py
from concurrent.futures import ProcessPoolExecutor
import os
def leak():
_ = [bytearray(1024*1024) for _ in range(10000)]
if __name__ == "__main__":
print("Parent PID:", os.getpid())
with ProcessPoolExecutor(max_workers=2) as pool:
pool.submit(leak)
pool.submit(leak)
print("All reaped, no zombies.")
✅ 永远用 with:ProcessPoolExecutor、Pool、Popen
✅ 或显式 join() / wait()
✅ Unix 常驻服务可兜底 SIGCHLD = SIG_IGN
✅ 监控:ps -eo pid,ppid,state,comm | grep 'Z'
✅ 减少 OOM:控制并发、量化模型、加 swap