20210508_nft_tokenid_content.rst (8648B)
1 The NFT token id as URI 2 ####################### 3 4 :date: 2021-05-08 19:14 5 :modified: 2021-05-10 09:18 6 :category: Code 7 :author: Louis Holbrook 8 :tags: nft,evm,hash,key-value store,decentralized storage 9 :slug: nft-tokenid-content 10 :lang: en 11 :summary: How to embed asset references for NFTs that are independent of providers in the current standard. 12 :status: published 13 14 15 Let's consider an NFT that works like a badge for participating in development of a software project. 16 17 This token is awarded as a proof that the task was completed. 18 19 To make things more fun, each NFT should have some unique, immutable content attached to it. 20 21 In other words, the properties of this token, once set, should never change. 22 23 Nor should they disappear. 24 25 So how do we refer to the artwork asset within the token standard? 26 27 28 It was acceptable at the time 29 ============================= 30 31 The ERC721 standard is not explicit about where the assets that belong with the NFT can be discovered and resolved. 32 33 At the time when the standard was adopted by the Ethereum community, there were multiple *"[...] Alternatives considered: put all metadata for each asset on the blockchain (too expensive), use URL templates to query metadata parts (URL templates do not work with all URL schemes, especially P2P URLs), multiaddr network address (not mature enough)."* Furthermore, they *"[...] considered an NFT representing ownership of a house, in this case metadata about the house (image, occupants, etc.) can naturally change."* [EIP721]_ 34 35 A "changing house" doesn't sound quite like what we need. And anyway; if we stick a good old web2 URI in there, then that will end up on the great bonfire of dead links before long. 36 37 38 Image, schmimage 39 ================ 40 41 To be honest, I find the presumption in the optional EIP721 metadata structure to be surprisingly short-sighted. It *specifically* defines the asset as an image, and at the same time is presupposes that only a *single* asset file will be used. 42 43 We may want to add *multiple* sources, so this is another obstacle for us. 44 45 So how to get around this, while still playing nice with existing implementations out there? Two ideas come to mind: 46 47 - Embed a *thumbnail* as a preview of the artwork using a :code:`base64` *data URI* [1]_ in the metadata. Stick :code:`name` and :code:`description` on it, and the schema is still fulfilled. 48 - *Extend* the structure with a list of *attachments* that *our* application layer knows about. Of course, each of these can have the same format as above. 49 50 In other words: 51 52 .. include:: code/nft-tokenid-content/erc721_metadata_schema_base.json 53 :code: json 54 55 56 Mirror, mirror 57 ============== 58 59 Since the asset reference shouldn't change, we can refer to it by its fingerprint or `content address <https://en.wikipedia.org/wiki/Content-addressable_storage>`_. If we define that the resource can be looked up over HTTP by that fingerprint as its basename, then we are free to define and modify whatever list of mirrors for that resource that's valid for any point in time. The application layer would simply try the endpoints one after another. 60 61 We take the :code:`sha2-256` [2]_ of the asset reference (the json file above, free of evil whitespace and newlines): 62 63 .. code-block:: bash 64 65 $ cat reference.json | jq -c -j | sha256sum | awk '{ print $1; }' 66 3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551 67 68 Imagine we had a mirror list of https://foo.com and https://bar.com/baz/. Then our application would try these urls in sequence, stopping at the first that returns a valid result: 69 70 .. code-block:: text 71 72 https://foo.com/3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551 73 https://bar.com/baz/3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551 74 75 Once we receive the content, all we have to do is hash it ourselves and verify that the sum matches the basename of the URI. If it doesn't the result is of course not valid and we continue down the list, appropriately banning the mischievous server then throrougly harassing its admin. 76 77 78 Cast away 79 ========= 80 81 Since our fingerprint is 32 bytes, it fits exactly inside the :code:`tokenId` (:code:`uint256`). Let's decide to big-endian numbers when converting (I find them easier to make sense of). In that case our hash from the reference turns into this modest number: 82 83 .. code-block:: python 84 85 # python3 86 >>> hx = bytes.fromhex('3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551') 87 >>> int.from_bytes(hx, byteorder='big') 88 28891040728719892888467057134569335350980764617882743994259054993630416573777 89 90 As long as we're composing the :code:`evm` inputs ourselves, we don't really have to worry about the integer representation in this particular case. But the interface is defined as an integer type, and other mortals may be using higher level interfaces, we have to be explicit about our choice. 91 92 93 Welcoming mint 94 ============== 95 96 Assume we have a method :code:`mintTo(address _recipient, uint256 _tokenId)` on our NFT contract. The Solidity signature of that contract is :code:`edb20b7e` [3]_. If I were to mint to myself then the input to the contract would be: 97 98 .. code-block:: text 99 100 edb20b7e000000000000000000000000185cbce7650ff7ad3b587e26b2877d95568805e33fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551 101 102 Broken down: 103 104 .. code-block:: text 105 106 signature: edb20b7e 107 address, zero-padded: 000000000000000000000000185cbce7650ff7ad3b587e26b2877d95568805e3 108 token id: 3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551 109 110 The corresponding web3.js code would look like: 111 112 .. code-block:: javascript 113 114 const c = new web3.eth.Contract([...], '0x...'); 115 c.methods.mintTo('0x185cbce7650ff7ad3b587e26b2877d95568805e3', 28891040728719892888467057134569335350980764617882743994259054993630416573777).call(); 116 117 To satisfy the `tokenURI` method, we can generate a string that's prefix with sha256 as a "scheme" [4]_. A bit of (unoptimized) solidity helps us out here: 118 119 .. include:: code/nft-tokenid-content/tohex.sol 120 :code: solidity 121 122 This will return :code:`sha256:3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551` for :code:`tokenId` :code:`3fdfbfe3b510b69f90cd92618e4c1ec76cf8b9c330bc2da1922acda8f84f9551` as input, provided that the :code:`tokenId` actually exists. That may seem a bit useless at first, but consider the scenario where we want to interface with other NFTs aswell. Or perhaps we are implementing a contract that optionally can support a static web2 URI in storage. By doing it this way, all bases are covered. 123 124 125 Decentralized identifiers 126 ========================= 127 128 Even better would be to add redundancy with autonomous decentralized storage. However, networks like `Swarm <https://ethswarm.org>`_ and `IPFS <https://ipfs.io>`_ use their own hashing recipes. That means that for every network referenced, we'd have to define an *alternative* in our reference structure. 129 130 Referencing the canonical :code:`sha256` aswell as the :code:`Swarmhash` for the same item could then look like this [5]_: 131 132 133 .. include:: code/nft-tokenid-content/erc721_metadata_schema_swarm.json 134 :code: json 135 136 ---- 137 138 .. 139 140 .. [1] Yes, they are valid URIs actually: https://www.rfc-archive.org/getrfc.php?rfc=2397 141 142 .. 143 144 .. [2] Likely it would be prudent to start using the official :code:`sha3` instead of :code:`sha2` these days, also because the :code:`sha2` hash is not a builtin for :code:`evm`. But neither is :code:`sha3`. The :code:`keccak256` Bitcoin uses, which EVM has inherited, is a pre-cursor to the :code:`keccak` published as the *official* :code:`sha3`. Still, :code:`keccak256` and :code:`sha3` is used interchangeably in opcode lists (and previously in `Solidity <https://docs.soliditylang.org/en/v0.8.0/050-breaking-changes.html#functions>`_ too). This has caused me quite a fair bit of confusion, I might add. Apart from it being ambiguous, the :code:`keccak256` tooling is also less common in the wild. Therefore :code:`sha2` seems like a safer bet for our experiments. It's not broken yet, after all. 145 146 .. 147 148 .. [3] The hex result of :code:`keccak256("mintTo(address,uint256)")` 149 150 151 .. 152 153 .. [4] *Data URI* is of no use here, because the hash itself is just nondescript binary data. Luckily :code:`<scheme>:<path>` is still a valid URI. 154 155 .. 156 157 .. [5] Here the hashes represent the media content itself, not the reference. That's why the :code:`sha256` one is different than before. 158 159 .. 160 161 .. [EIP721] https://eips.ethereum.org/EIPS/eip-721