DO Ideas 2

Take a snapshot without shutting down

Find a way to enable snapshots to be taken without shutdown. It causes a lot of issues, especially with production projects.

  • Reese Jenner
  • Sep 11 2018
  • Shipped
  • Sep 11, 2018

    Admin Response

    Hi everyone, I’m happy to share that the ability to create snapshots without having to power down your Droplet is now live via the cloud control panel as well as through the API. There are no changes to the API and you may simply hit the end-point to create a snapshot without having to first power down the Droplet. For workloads with high disk I/O, you are still encouraged to shutdown your Droplet before taking a snapshot to maintain data consistency as the data may still be in memory and not fully flushed to disk. From a technical perspective this is similar to how backups are taken as described more in depth here: https://www.digitalocean.com/community/tutorials/understanding-digitalocean-droplet-backups#when-digitalocean-backups-may-not-be-the-best-solution All feedback, comments, questions and concerns are welcome. Please feel free to email me directly at bschaechter@digitalocean.com. Ben Schaechter Product Manager, Droplet
  • Attach files
  • Patrik Karisch commented
    September 11, 2018 18:41

    The best gift today. Thank you all from DO who made this possible!

  • Armen Markossyan commented
    September 11, 2018 18:41

    This is extremely serious because some major competitors of DO already offer this kind of feature and I'm forced to recommend them when people ask me about good VPS providers. Please make this happen because I'm a big fan of DigitalOcean for many reasons.

  • Andrew MacNaughton commented
    September 11, 2018 18:41

    Still shocked you don't have this has an option

  • Marc H commented
    September 11, 2018 18:41

    Great idea, this would also make more frequent backups possible instead of the current weekly strategy.

  • Digital D commented
    September 11, 2018 18:41

    Great idea

  • Anonymous commented
    September 11, 2018 18:41

    Very important

  • Olaf Lederer commented
    September 11, 2018 18:41

    We really need that, at least one snapshot like you get from Linode or Vultr.

  • Kai commented
    September 11, 2018 18:41

    Please, enable online snapshots. I occasionally have to make a major change to my server and in order to take a snapshot I have to take it offline for 10-40 minutes and there seems to be no way for me to estimate this timeframe. An online snapshot option (even if it took a couple hours) would be great.

  • Jonathan H commented
    September 11, 2018 18:41

    Back in July, Moisey wrote:

    "We’ve rolled out a new version of our cloud backend code in the new Singapore region, which is v1.5.

    This version has full support for snapshots that occur while the droplet is running without the need for shutting down.

    We are now beginning a process to migrate the existing regions to the new code base and will be rolling this out over the next several months."

    Any idea what version London is on, and when it might get this hot backup option and be upgraded to v1.5?

  • Rudi commented
    September 11, 2018 18:41

    Or alternatively, have a option by which the droplet is powered off automatically to ensure that there is least possible downtime.

  • Jeff Reifman commented
    September 11, 2018 18:41

    Thanks Moisey - there are three related issues: 1) allow snapshots to be duplicated - this will make it easier to create copies of snapshots 2) don't power on droplets after a snapshot automatically - because this makes making copies slower and 3) if you would allow people to share an image as a reference (let other users create droplets from one user's image - without removing it from the source user's account) then that would also help with this problem.

  • Dmitry Grosman commented
    September 11, 2018 18:41

    Probably because of old qemu version for the hypervisors, no live snapshots for us until they upgrade.

  • rosanablao commented
    September 11, 2018 18:41

    Thankyou sis for the monopod n ipega bluetooth remote,.its super nice to make selfie pic end make easy to do,.for the next time shipment,.thankyou n Godbless

  • Jeremy Price commented
    September 11, 2018 18:41

    Replicas are not backups any more than RAID is, as any data-layer screw-ups/corruptions/compromises/etc.. will be propagated across the cluster.

    Now tell me, is it quicker to to PITR recovery from a snapshot 4 hours ago or from one during these imaginary off-peak hours of which you speak?

    This is not some far-out pipe-dream of a technology I'm asking for. Live snapshots are not a new thing. I should _not_ have to take a server down to snapshot the filesystem. I haven't had to in 4 years of working on AWS and as much as I like DigitalOcean and the people who work there, that feature, or lack thereof, is a showstopper for much of my workload.

    Feel free to armchair-quarterback how I "should" be running my infrastructure, but I prefer not to shut down my servers unless the kernel needs updating or the hardware fails. If i wanted something I had to reboot all the time I'd run windows.

  • Wayne Hartmann commented
    September 11, 2018 18:41

    It would be a useful option, but as others have said, will require a lot of changes to implement. If your production servers are that vital you can't risk having scheduled downtime to do a snapshot during non-peak hours. Then you should be doing some sort of load-balancing / replication setup to begin with. I would be more worried about a hardware failure on the node your droplet resides on, before i would worry about scheduled downtime to take a snapshot.

  • Jeremy Price commented
    September 11, 2018 18:41

    Or i could put postgres into backup mode, sync the (which ever) FS, snap the FS, take postgres out of backup mode and go about my merry way.

    Point is there are ways to mitigate the danger involved and for many those calculated risks are preferable than the downtime involved in having to turn off a machine and then wait for it to snap.

    The ability should be there. Perhaps it should come with warnings that data integrity is the responsibility of the user, but it should be there.

  • Richard Yao commented
    September 11, 2018 18:41

    Achieving this would require the use of a crash-safe filesystem in guests and a volume manager between the RAID stack and hypervisor. That is being done in FreeBSD right now with ZFS, Bhyve and UFS SU+J.

    Doing this with KVM is possible, but the guest filesystem will need to change to support this, the backend stuff would need modification to support placement of the guest on a volume manager that supports instantaneous snapshots (my vote for ZFS) and the VMs would need to be migrated to it.

    Digital Ocean would likely need to hire a Linux kernel hacker familiar with storage stacks to implement this. Incidentally, I am such a hacker, although I suspect my knowledge of the subject gives that away. Seeing something like this implemented in Linux would be really cool.

  • Anonymous commented
    September 11, 2018 18:41

    Need this ASAP

  • Heihachi commented
    September 11, 2018 18:41

    +3 waiting for this functionality, because backups pricing is terrible (20% of total doplet cost!). So if you have 160GB droplet you are going to pay 192$ instead of advertised 160$ not fair.

  • Reese Jenner commented
    September 11, 2018 18:41

    I am pleased you're looking into this - luckily my current project only has a couple of megabyte files, nothing major - still I will run bigger projects with Gigabytes of data... I can't have hours of downtime....

  • Heihachi commented
    September 11, 2018 18:41

    That is a MUST HAVE option.

  • Ibrahim Benzer commented
    September 11, 2018 18:41

    thats a hiccup please solve this...

  • Jeremy Price commented
    September 11, 2018 18:41

    Because I have clients who pay me to not bring down their websites multiple times a day. Because my DB server hasn't seen 5minutes of downtime in the last 6 months.. never mind 6x/day for 4hr snapshots.

  • Samuel Lewis commented
    September 11, 2018 18:41

    It takes like 5 minutes... why worry?

  • Jeremy Price commented
    September 11, 2018 18:41

    Michael Hicks: You're combining two separate operations. On AWS you can snapshot an EBS volume or create an AMI. Snapshotting is a low-level storage operation that doesn't affect the machine.

    Creating an AMI _can_ reboot the machine in the process, but it isn't required. They recommend it but give you the option to do it w/o reboot if you're prepared to deal with possible data inconsistencies yourself.

  • Michael Hicks commented
    September 11, 2018 18:41

    FWIW, I've noticed that my EC2 instance always reboots after I snapshot to AMI. I don't have to shut down beforehand, but I assume they are making that happen automatically as part of the AMI snapshot process.

  • Mikhail Emelchenkov commented
    September 11, 2018 18:41

    Shutting down, seriously? Wow, I did not expect to find so much reefs in Digital Ocean :)

  • Alex Pole commented
    September 11, 2018 18:41

    Agree that this should be top priority! Especially today where I find myself coming up on 2 hours of downtime waiting for a snapshot to process due to scheduler issues. Refunding downtime is one thing, but there is no way to refund time wasted when your customers are supposed to be working and can't bring their site back up or even cancel the snapshot!

  • Jeremy Price commented
    September 11, 2018 18:41

    I want to give you my business but can't until this happens... shutdown for snapshots != production ready.

  • Moisey Uretsky commented
    September 11, 2018 18:41

    We're going to be working towards that in the future so that all snapshots and backups are processed from running droplets with minimal impact and while still retaining disk consistency on the taken snapshots.

    Thanks

  • Moisey Uretsky commented
    September 11, 2018 18:41

    There is no queuing system to the snapshots the issue with the variable amount of time it takes is due to the difference in amount of content that users may have placed on the server.

    The more files, smaller files, etc that a server has the longer the snapshot takes to complete.

    Thanks

  • Dirk Postma commented
    September 11, 2018 18:41

    +1 Nobidy wants *downtime*, esp. when the reason is making a backup!

    Temporarily running another droplet is no easy option because... you need fresh data (a snapshot?) for that.