In the early days of android i had an app that had to do video transcoding yet often hit oom on startup (reported via telemetry) even when the phone should have enough memory. This was before android had any video transcoding built in (2.3 days).
The solution was to spawn a child process, use memory in a loop, catch the sigkill in the parent, yield to the os as it killed other processes to free memory in the device as a whole and then on return from sleep in the parent process after killing the child start the video transcoding.
Hopefully this hack is not needed but if you want android to proactively run its process killing job so your app starts with maximum free memory the above worked!
Start both children, then call wait(), which blocks until any child exits and returns the pid of the child that exited. If it's the command child, then your command finished. If it's the other child, then the timeout expired.
Now that one child has exited, kill() the other child with SIGTERM and reap it by calling wait() again.
All of this assumes you'll only have these two children going, but if you're writing a small exponential backoff command retry utility, that should be OK.
Lol, author's thought process mirrored mine as I read the article, as I was reading I was thinking, 'doesn't kqueue support that?... and then a section on kqueue. Then I was thinking to myself, so how does the Linux implementation do it then?... was just about to start trawling the source code when 'A parenthesis..'
Great article. Sorry to say though, Windows does manage all this in a more consistent way - but I guess they had the benefit of a clean slate.
signalfd / process descriptiors are the Windows style mechanism... what is missing are a few things like 'spawn' that returns a fd directly (eliminating races...)
An interesting aspect of waitid is that it allows you to access the full exit code of the process (i.e., the entire int instead of just the bottom 8 bits).
Unfortunately, many operating systems implement waitid() on top of one of the older APIs, meaning the top bits get lost regardless…
Edit: by threads I mean creating a new thread to wait for the process, and then kill the process after a certain timeout if the process hasn't terminated. I guess I'm spoiled by Go...
2. That thread starts a child process and signals "started" by storing its PID somewhere globally-visible (and hopefully atomic/lock-protected).
3. The thread then blocks in wait(2), taking advantage of its non-main-thread-ness to avoid some signals and optionally masking/ignoring some more.
4. When the process exits, the thread can write exitstatus/"completed" to the globally-visible state next to PID. The thread then exits.
3. External observers wait for the process with a timeout by attempting to join the thread with a timeout. If the timeout occurs, they can access the globally-visible PID and send a signal to it.
This is missing from the article (EDIT: it has since been added, thanks!). That doesn't mean it's a good solution on many platforms. It's more costly in resources (thread stack), more code than most of the listed options, vulnerable to PID-reuse problems that can cause a killsignal to go to the wrong process, likely plays poorly with spawning methods that request a SIGCHLD be sent to the parent on exit (and plays poorly with signals in general if any customization is needed there), and is probably often slower than most of TFA's alternatives as well, both due to syscall count and pessimal thread/scheduler switching conditions. Additionally, it multiplexes/composes to large numbers of processes poorly and with a high resource cost.
EDIT: Golang's version of this is less bad than described above, but not perfect. Go's spawning infrastructure mitigates resource cost (goroutines/segmented stacks are not as heavy as threads), is vulnerable to PID-reuse (as are most platforms' operations in this area), addresses the SIGCHLD risk through the runtime and signal channels, and mitigates slowness with a very good scheduler. For multiplexing, I would assume (but I have not verified) that the Go runtime is internally using pidfds/kqueue where supported. Where not supported, I would assume Go is internally tracking spawn requests through its stdlib, handling SIGCHLD, and has a single global routine calling wait(2) without a specific PID, waking goroutines waiting on a watched PID when it comes out of the call to wait(2).
Thanks. I believe that Go indeed _could_ use those APIs to wait for the child more efficiently if they chose to, but the current implementation suggests that they're just calling wait4() in a separate thread: https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/...
To be fair, in Go process spawning is very inefficient to begin with, since it requires lots of runtime coordination to not mess with the threads/goroutines state during fork, so running wait4() in a separate thread (although the thread can be re-used afterwards) is not the biggest concern here.
> I would prefer extending poll to support things other than file descriptors, instead of converting everything a file descriptor to be able to use poll.
Why? The ability to block on these descriptors as a one off rather than wrapping into a poll makes them extremely useful and avoids the race issues that exist with signal handlers and other non-blocking mechanisms.
signalfd, timerfd, eventfd, userfaultfd, pidfd are all great applications of this strategy.
Thanks for this great article, it is going to be very useful for my project. I am currently developing an open source Android native app that invokes rsync when a file gets closed (ie: you take a picture)
> Because the Linux kernel coalesces SIGCHLD (and other signals), the only way to reliably determine if a monitored process has exited, is to loop through all PIDs registered by any kqueue when we receive a SIGCHLD. This involves many calls to waitid(2) and may have a negative performance impact.
This is somewhat wrong. To speed things up in the happy case (where we are the only part of the program that is spawning children), you can just do a `WNOHANG` wait for any child first, and check if it's one of the children we care about. Only if it's an unknown child do you have to do the full loop (of course, if you only have a couple of children the loop may be better).
That is K&R C syntax, supported up to C18. The solution to tools emitting unwanted diagnostics is not to appease them with pointless cruft but to shut off the diagnostic:
I know this is related but maybe someone smarter than me can explain how closely it relates (or doesn’t) to this issue which seems more general (iirc Cantrill was talking about fs events not child processes generally)
In the early days of android i had an app that had to do video transcoding yet often hit oom on startup (reported via telemetry) even when the phone should have enough memory. This was before android had any video transcoding built in (2.3 days).
The solution was to spawn a child process, use memory in a loop, catch the sigkill in the parent, yield to the os as it killed other processes to free memory in the device as a whole and then on return from sleep in the parent process after killing the child start the video transcoding.
Hopefully this hack is not needed but if you want android to proactively run its process killing job so your app starts with maximum free memory the above worked!
Tenth Approach: fork() two processes.
Child 1 exec()s the command.
Child 2 does this:
Start both children, then call wait(), which blocks until any child exits and returns the pid of the child that exited. If it's the command child, then your command finished. If it's the other child, then the timeout expired.Now that one child has exited, kill() the other child with SIGTERM and reap it by calling wait() again.
All of this assumes you'll only have these two children going, but if you're writing a small exponential backoff command retry utility, that should be OK.
Lol, author's thought process mirrored mine as I read the article, as I was reading I was thinking, 'doesn't kqueue support that?... and then a section on kqueue. Then I was thinking to myself, so how does the Linux implementation do it then?... was just about to start trawling the source code when 'A parenthesis..'
Great article. Sorry to say though, Windows does manage all this in a more consistent way - but I guess they had the benefit of a clean slate.
signalfd / process descriptiors are the Windows style mechanism... what is missing are a few things like 'spawn' that returns a fd directly (eliminating races...)
there is no race from the parent
the pid will not be reused until you either handle sigchld or wait
[dead]
Not so much about timeouts, but related in that it is based around managing children processes:
The lineage of tools descending from daemontools for service management is worth exploring:
daemontools: http://cr.yp.to/daemontools.html
runit: https://smarden.org/runit/
s6: https://skarnet.org/software/s6/
dinit: https://davmac.org/projects/dinit/
FWIW io_uring does have support for waitid.
https://www.man7.org/linux/man-pages/man3/io_uring_prep_wait...
An interesting aspect of waitid is that it allows you to access the full exit code of the process (i.e., the entire int instead of just the bottom 8 bits).
Unfortunately, many operating systems implement waitid() on top of one of the older APIs, meaning the top bits get lost regardless…
Many thanks! I have added it to the article in due form now.
So many ways and no-one mentioned threads..?
Edit: by threads I mean creating a new thread to wait for the process, and then kill the process after a certain timeout if the process hasn't terminated. I guess I'm spoiled by Go...
The threading approach is roughly:
1. Start a thread
2. That thread starts a child process and signals "started" by storing its PID somewhere globally-visible (and hopefully atomic/lock-protected).
3. The thread then blocks in wait(2), taking advantage of its non-main-thread-ness to avoid some signals and optionally masking/ignoring some more.
4. When the process exits, the thread can write exitstatus/"completed" to the globally-visible state next to PID. The thread then exits.
3. External observers wait for the process with a timeout by attempting to join the thread with a timeout. If the timeout occurs, they can access the globally-visible PID and send a signal to it.
This is missing from the article (EDIT: it has since been added, thanks!). That doesn't mean it's a good solution on many platforms. It's more costly in resources (thread stack), more code than most of the listed options, vulnerable to PID-reuse problems that can cause a killsignal to go to the wrong process, likely plays poorly with spawning methods that request a SIGCHLD be sent to the parent on exit (and plays poorly with signals in general if any customization is needed there), and is probably often slower than most of TFA's alternatives as well, both due to syscall count and pessimal thread/scheduler switching conditions. Additionally, it multiplexes/composes to large numbers of processes poorly and with a high resource cost.
EDIT: Golang's version of this is less bad than described above, but not perfect. Go's spawning infrastructure mitigates resource cost (goroutines/segmented stacks are not as heavy as threads), is vulnerable to PID-reuse (as are most platforms' operations in this area), addresses the SIGCHLD risk through the runtime and signal channels, and mitigates slowness with a very good scheduler. For multiplexing, I would assume (but I have not verified) that the Go runtime is internally using pidfds/kqueue where supported. Where not supported, I would assume Go is internally tracking spawn requests through its stdlib, handling SIGCHLD, and has a single global routine calling wait(2) without a specific PID, waking goroutines waiting on a watched PID when it comes out of the call to wait(2).
Thanks. I believe that Go indeed _could_ use those APIs to wait for the child more efficiently if they chose to, but the current implementation suggests that they're just calling wait4() in a separate thread: https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/...
To be fair, in Go process spawning is very inefficient to begin with, since it requires lots of runtime coordination to not mess with the threads/goroutines state during fork, so running wait4() in a separate thread (although the thread can be re-used afterwards) is not the biggest concern here.
Thanks for the suggestion, I have added a short section about threads.
> I would prefer extending poll to support things other than file descriptors, instead of converting everything a file descriptor to be able to use poll.
Why? The ability to block on these descriptors as a one off rather than wrapping into a poll makes them extremely useful and avoids the race issues that exist with signal handlers and other non-blocking mechanisms.
signalfd, timerfd, eventfd, userfaultfd, pidfd are all great applications of this strategy.
One issue is that fds are a relatively limited and relatively heavyweight resource.
They used to be.
And `nr_open` has been 1048576 for a long time. The default quota limit is 1024 but that is easy to change.Thanks for this great article, it is going to be very useful for my project. I am currently developing an open source Android native app that invokes rsync when a file gets closed (ie: you take a picture)
https://github.com/aguaviva/Syncy
> Because the Linux kernel coalesces SIGCHLD (and other signals), the only way to reliably determine if a monitored process has exited, is to loop through all PIDs registered by any kqueue when we receive a SIGCHLD. This involves many calls to waitid(2) and may have a negative performance impact.
This is somewhat wrong. To speed things up in the happy case (where we are the only part of the program that is spawning children), you can just do a `WNOHANG` wait for any child first, and check if it's one of the children we care about. Only if it's an unknown child do you have to do the full loop (of course, if you only have a couple of children the loop may be better).
What is the meaning of this code ?
If it's C code, that is the way to suppress a compiler warning about sig being unused. In C++ you can omit (or comment-out) the parameter name, e.g.:
Thank you
So this would be a way which predates' C23's maybe_unused attribute¹
Nice trick
[1] https://en.cppreference.com/w/c/language/attributes/maybe_un...
That is K&R C syntax, supported up to C18. The solution to tools emitting unwanted diagnostics is not to appease them with pointless cruft but to shut off the diagnostic:
Parenting 101
He mentions Bryan Cantrill in there and I can’t resist posting his famous epoll/kqueue rant:
https://youtu.be/l6XQUciI-Sc?t=3643
I know this is related but maybe someone smarter than me can explain how closely it relates (or doesn’t) to this issue which seems more general (iirc Cantrill was talking about fs events not child processes generally)
[dead]