Cleanup for failed jobs?

Summary

If jobs leave Errored pods, toolkit CI decides that deployment is not healthy and does not allow exit.

In principle these kind of hanging pods will (and should) leave warnings/alerts in the prod cluster too, so it's not great to ignore them completely. We could also set:

https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically

But since this is part of some self-healing process, we can detect them and clean, leaving a warning. Like we can also leave a warning if pods are restarting before deploying (which means they miss some waiting capabilities)

/cc @mlinhoff