manbytesgnu_site

Source files for manbytesgnu.org
git clone git://holbrook.no/manbytesgnu_site.git
Log | Files | Refs

20220111_backup_rsync_duplicity.rst (6064B)


      1 Combining duplicity and rsync
      2 #############################
      3 
      4 :date: 2022-01-15 16:57
      5 :category: Archiving
      6 :author: Louis Holbrook
      7 :tags: backup,rsync,duplicity,bash
      8 :slug: backup-rsync-duplicity
      9 :summary: An exercise in combining plain and encrypted backups on local and remote hosts
     10 :series: Organizing backups
     11 :seriesprefix: organizing-backups
     12 :seriespart: 1
     13 :lang: en
     14 :status: published
     15 
     16 
     17 There are two awesome, weathered tools out there that are all you really need for your personal backups. [1]_ One is the `rsync cli`_, the other is duplicity_.
     18 
     19 The former should need no introduction.
     20 
     21 The latter operates more like tar. But it still works over ssh like rsync. In fact, it's based on librsync_ which implements the `rsync protocol`_. The special sauce however is, of course, *encryption*.
     22 
     23 
     24 Backup categories
     25 =================
     26 
     27 Let's for the sake of argument say that our personal backups can be divided in three categories:
     28 
     29 
     30 Stuff that can be public
     31 ------------------------
     32 
     33 Code snippets, git repositories, public data store states (e.g. blockchain ledgers), copies of OS packages and any other assets assets without redistribution issues.
     34 
     35 For this we will use rsync_.
     36 
     37 
     38 Sensitive stuff
     39 ---------------
     40 
     41 Passwords, keys, contacts, calendars, contracts, invoices, task lists, databases, system configurations, application data.
     42 
     43 For this we will use duplicity_.
     44 
     45 Secret stuff
     46 ------------
     47 
     48 Long-lived keys, password- and volume decryption keys, cryptocurrency keys and meta-information about the backups themselves.
     49 
     50 This will not be addressed now.
     51 
     52 
     53 Why not just one or the other?
     54 ==============================
     55 
     56 Duplicity_ stores everything in an archive file format. That means that you must first authenticate, decrypt and unpack the archive in order to even browse the files inside.
     57 
     58 If there is no reason to keep the files from prying eyes, then it's much more practical to be able to browse the files where they lie, with the regular filesystem tools. In such a case, rsync_ will scratch your itch.
     59 
     60 For the **sensitive** and **secret stuff**, there would be no real need to use duplicity_ if you were only operating on your local host. You'd just use an encrypted volume [2]_ and rsync_ everything in there.
     61 
     62 But half the point here is to keep remote copies aswell as your local ones. You know, in case of fire, hardware-eating locust swarms or some totalitarian minions nabbing all your electronics. Unless "remote" here means some box hidden in some moated leisure castle of yours, you'll want to encrypt everything *before* you ship it off. And that's where duplicity_ comes in.
     63 
     64 
     65 Vive la difference
     66 ==================
     67 
     68 Of course, it would be too much to hope for that duplicity_ and `rsync cli`_ have aligned the ways they parse their invocation parameters.
     69 
     70 Here are some examples [3]_ of how they do *not* match:
     71 
     72 
     73 local to local
     74 --------------
     75 
     76 .. code-block:: bash
     77 
     78         $ rsync -a src/ /path/to/dst/
     79 
     80         $ duplicity src/ file:///path/to/dst/
     81 
     82 
     83 local to remote, relative path
     84 ------------------------------
     85 
     86 .. code-block:: bash
     87 
     88         $ rsync -a src/ user@remotehost:path/to/dst/
     89 
     90         $ duplicity src/ scp://user@remotehost/path/to/dst
     91 
     92 
     93 toggle dotfiles from current path
     94 ---------------------------------
     95 
     96 .. code-block:: bash
     97 
     98         # include only .foo/foo.txt given the current structure:
     99         $ tree src/ -a
    100         src/
    101         ├── .bar
    102         ├── baz
    103         └── .foo
    104             └── foo.txt
    105         
    106         $ rsync --exclude=".b*" --include=".*/***" --exclude="*" ./ ../dst/
    107 
    108         $ duplicity --exclude="./.b*" --include="./.*/***" --exclude="*" ./ file:///home/lash/tmp/dst/
    109 
    110 logging
    111 -------
    112 
    113 .. code-block:: bash
    114 
    115         # spill the beans
    116         $ rsync -vv ...
    117 
    118         $ duplicity -v debug
    119 
    120 
    121 
    122 Batchin'
    123 ========
    124 
    125 Since you will want to select up front which tool to use for which sensititivy category, you'll be writing the includes and excludes specifically for the tool anyway.
    126 
    127 So the only real issue with the above is the way remote host is specified.
    128 
    129 Let's say we choose to stick to the `rsync cli`_ host format. That means we need to make the following translations:
    130 
    131 .. list-table::
    132         :widths: 50 50
    133         :header-rows: 1
    134 
    135         * - rsync
    136           - duplicity
    137         * - ``foo/bar``
    138           - ``file://foo/bar``
    139         * - ``/foo/bar``
    140           - ``file:///foo/bar``
    141         * - ``user@host:foo/bar``
    142           - ``scp://user@host/foo/bar``
    143         * - ``user@host:/foo/bar``
    144           - ``scp://user@host//foo/bar``
    145 
    146 
    147 Expressed in ``bash`` that could look like this:
    148 
    149 .. include:: code/backup-rsync-duplicity/translate.sh
    150         :code: bash
    151 
    152 
    153 Let's behave and test our code:
    154 
    155 .. include:: code/backup-rsync-duplicity/translate_test.sh
    156         :code: bash
    157 
    158 .. code-block:: bash
    159 
    160         # 0 == good!
    161         $ BAK_TEST=1 bash remote.sh && echo $?
    162         0
    163 
    164 
    165 Now we can use the `rsync cli`_ path input, and use that same input to a batch of single backup steps, each which may use `rsync cli`_ or duplicity_
    166 
    167 .. code-block:: bash
    168 
    169         to_duplicity_remote localhost:/foo/bar
    170 
    171         rsync -avzP pub/ $remote_base:src/
    172 
    173         duplicity -v info secret/ $remote_duplicity_base:secret/
    174 
    175 
    176 See also
    177 ========
    178 
    179 * https://git.defalsify.org/rsync-duplicity-backups
    180 
    181 
    182 ..
    183 
    184         .. [1] Ok, I know, I assuming that you are using ``git`` in daily life, too.
    185 
    186         .. [2] Provided, of course, that it's an encrypted volume that you don't keep unlocked all the time.
    187 
    188         .. [3] Duplicity needs at a minimum a password for symmetric encryption, and will prompt for it unless it's set in the environment. Simply ``export PASSPHRASE=test`` for these examples to relieve you of the annoyance.
    189 
    190 ..
    191 
    192         .. _duplicity: https://duplicity.gitlab.io/duplicity-web/
    193 
    194         .. _Duplicity: https://duplicity.gitlab.io/duplicity-web/
    195 
    196         .. _`rsync cli`: https://rsync.samba.org/
    197 
    198         .. _rsync: https://rsync.samba.org/
    199 
    200         .. _librsync: http://librsync.sourcefrog.net/
    201 
    202         .. _`rsync protocol`: https://rsync.samba.org/tech_report/