Dustin C. Hatch d4638239b3
All checks were successful
dustin/k8s-reboot-coordinator/pipeline/head This commit looks good
drain: Wait for outer loop to complete
There was a race condition while waiting for a node to be drained,
especially if there are pods that cannot be evicted immediately when the
wait starts.  It was possible for the `wait_drained` function to return
before all of the pods had been deleted, if the wait list temporarily
became empty at some point.  This could happen, for example, if multiple
`WatchEvent` messages were processed from the stream before any messages
were processed from the channel; even though there were pod identifiers
waiting in the channel to be added to the wait list, if the wait list
became empty after processing the watch events, the loop would complete.
This is made much more likely if a PodDisruptionBudget temporarily
prevents a pod from being evicted; it could take 5 or more seconds for
that pod's identifier to be pushed to the channel, and in that time, the
rest of the pods could be deleted.

To resolve this, we need to ensure that the `wait_drained` function
never returns until the sender side of the channel is dropped.  This
way, we are sure that no more pods will be added to the wait list, so
when it gets emptied, we are sure we are actually done.
2025-09-29 07:08:12 -05:00
2025-09-24 08:17:03 -05:00
2025-09-25 18:01:39 -05:00
2025-09-24 08:17:03 -05:00
2025-09-24 08:17:03 -05:00
2025-09-25 18:03:41 -05:00
2025-09-25 18:03:41 -05:00
2025-09-24 08:17:03 -05:00
Description
No description provided
163 KiB
Languages
Rust 97%
Shell 2%
Dockerfile 1%