Tarsnap is cozy

135 points by hiAndrewQuinn 3 days ago

I've been using tarsnap for years and am in the process of migrating away from it.

Things that are not cozy:

1) There's no way to monitor your monthly spend per host/credit left on the account/etc. apart of logging into your account in a browser and manually keeping a spreadsheet. There's no web API to do it. You get an email warning when you have about 7 days of credit left. That's it.

2) Nothing is "a precious few megabytes" anymore. What seems like a negligible monthly spend at first can quickly grow up on you and soon you're spending highly non-trivial amounts. Which you might not notice due to 1) unless you are diligent in your accounting.

3) tarsnap restores are slow. Really really slow. A full restore can take days if you have non-trivial amounts of data (and make sure you have enough credit in your account to pay for that server-to-client bandwidth!) My understanding is that throughput is directly related to your latency to the AWS datacenter where tarsnap is hosted. Outside of north America you can be looking at nearly dial-up speeds even on a gigabit link.

Again, a problem that can surprise you at the most inconvenient time. Incremental backups in a daily cronjob tend to transfer very small amounts of data, so you won't notice the slowness until you try to do a full restore. And you generally don't test that very often because you pay for server-to-client transfers.

There are some workarounds for 3) and there's a FAQ about it, but look at the mailing list and you'll see that it's something that surprises people again and again.

privatelypublic - 3 days ago

Sounds like it's just a worse Glacier setup then?
Amazon has Pre-Pay in a semi-open beta.
CloudFront has 1TB/month free- knocking a large chunk of a restore's cost. (Note- you should have either encrypted your stuff yourself and/or S3 authorization/access control still works over CF)
At what seems to be <$2/mo per TB ($1/TB glacier Deep archive + 9cent/gb for metadata on S3 frequent access), no other solution comes close. The big issue is the lump cost of a restore. Which, is quickly worn down by being > $5/TiB/mo cheaper than anybody else.
- amluto - 3 days ago
  
  Tarsnap has a nice security model, and it’s quite a challenge to convince any open-source tool to match it.
  - dividuum - 3 days ago
    
    restic is basically identical and you can choose where you store your data.
    
    amluto - 3 days ago
    
    restic can supposedly be set up to prevent a corrupted / compromised client from destroying old data using S3 versioning policy, but this doesn’t appear to be a well-supported feature with clearly-described security properties.
    Tarsnap, in contrast, has an explicit first-class ability to prevent a compromised client from damaging old backups.
    
    placardloop - 3 days ago
    
    That’s because restic is not opinionated about where and how you store your backups. Restic provides a nice interface to create the backups, and then lets you choose where you want to store them (and how access to them is managed), be it locally or via SFTP or S3 or many other backends. Any security properties related to S3 are not in the scope of what restic is meant to do.
    It’s pretty simple to enable versioning and object lock on your S3 bucket, but it is another step if you’re using restic. Sure, if you just want all of that taken care of for you, you can use tarsnap, but you’re paying a 5x+ premium for it.
    The other nice thing about restic is that since it’s just the client-side interface, it allows others to provide managed storage. Borgbase.com is a storage backend that is supported by Restic that supports append-only backups, and is cheaper than tarsnap.
    
    amluto - 3 days ago
    
    I disagree, strongly. Here are the relevant docs:
    https://restic.readthedocs.io/en/stable/030_preparing_a_new_...
    I would like to see an explicit discussion of what permissions are needed for what operation. I would also like to see a clearly specified model in which backups can be created in a bucket with less than full permissions and, even after active attack by an agent with those same permissions, one can enumerate all valid backups in the bucket and be guaranteed to be able to correctly restore any backup as long as one can figure out which backup one wants to restore.
    Instead there are random guides on medium.com describing a configuration that may or may not have the desired effect.
    
    placardloop - 3 days ago
    
    Again, this isn’t at all in the scope of restic’s docs. If you’re using S3 as the storage, it’s on you to understand how S3 works and what permissions are needed, just like it’s on you to understand how your local file system works and file permissions work if you use the local file system as a backend.
    If you don’t understand S3 or don’t want to learn, then that’s fine, and you can pay the premium to tarsnap for simplifying it for you. But that’s your choice, not an issue with restic.
    If you think differently, have you submitted a PR to restic’s docs to add the information you think should be there?
    
    privatelypublic - 3 days ago
    
    Interesting play on the debate- but after the response to restic's original decision to upstream Object Store permissions and features... to the Object Store, along with my attempts to explain S3 to several otherwise reasonably technical people....
    I think people are frequently trapped in some way of thinking (not sure exactly) that doesn't allow them to think of storage as anything other than Block based. They repeatedly try to reduce S3 to LBA's, or POSIX permissions (not even modern ACL type permissions), or some other comparison that falls apart quickly.
    Best I've come up with is "an object is a burned CD-R." Even that falls apart though
    
    amluto - 3 days ago
    
    I still completely disagree. It’s on me to understand IAM. It should not be on me to understand the way that restic uses S3 such that I can determine whether I can credibly restore from an S3 bucket after a compromised client gets permission to create objects that didn’t previously exist. Or to create new corrupt versions of existing objects.
    For that matter, suppose an attacker modifies an object and replaces it with corrupt or malicious contents, and I detect it, and the previous version still exists. Can the restic client, as written, actually manage the process of restoring it? I do not want to need to patch the client as part of my recovery plan.
    (Compare to Tarsnap. By all accounts, if you backup up, your data is there. But there are more than enough reports of people who are unable to usefully recover the data because the client is unbelievably slow. The restore tool needs to do what the user needs it to do in order for the backup to be genuinely useful.)