DO Ideas 2

allow droplet fencing when the hypervisor is nonresponsive

Hypervisor can get stuck for various reasons - just had a droplet unreachable for 40 minutes. In such a case, it's impossible to stop the droplet or detach its volumes.

If I have a droplet with blockstorage, and the data I care about is on the storage, I would like to be able to give up on the droplet without having to wait for DO to fix the hypervisor. Basically, a fence operation which would:

- detach the droplet volumes immediately
- detach floating IPs
- block outgoing traffic

These operations would have to be done at the storage/networking layers, and not at the hypervisor itself.

Once this is done, I can spin up a new droplet, attach (a clone of) the blockstorage, and be sure the 'stuck' droplet doesn't suddenly come back to live and eg. tries to do the same work my new droplet was doing (eg sending mail, sending data to external services).

The need for this kind of fencing is basically for the same reasons as you want 'STONITH' or fencing in HA clusters - I can't safely rebuild a service and resume from where the dead droplet left off, if I can't be sure this droplet isn't suddenly coming back.

It would be nice if the 'fenced' droplet was still accessible through the console at some future point for post-mortem analysis. But rendering it harmless is more important than recovery in this use case.

  • Arnold Hendriks
  • Sep 11 2018
  • Attach files