manbytesgnu_site

Source files for manbytesgnu.org
git clone git://holbrook.no/manbytesgnu_site.git
Log | Files | Refs

20210505_wasm_c_bare.rst (11462B)


      1 wasm and C - The bare necessities
      2 #################################
      3 
      4 :date: 2021-05-05 06:15
      5 :author: Louis Holbrook
      6 :category: Code
      7 :tags: wasm,c,clang,llvm
      8 :slug: wasm-c-bare
      9 :summary: The bare minimum needed for two-way communication between wasm and C
     10 :lang: en
     11 :status: published
     12 
     13 
     14 I am currently resuming my self-improvement task of learning Webassembly. Or, I should rather say, *using* Webassembly.
     15 
     16 Since I have some optimized small code chunks that I want to use in an embedded environment, it seemed sensible that the first step would be to establish two-way communication with C.
     17 
     18 In my last outing around three years ago, I was using Emscripten_ to bridge the gap. That tool adds quite a few bells and whistles, and doesn't quite yield that warm, fuzzy bare-metal rush. Emscripten_ relies on Clang_ and LLVM_, all of which seem to have gotten their :code:`wasm` support built-in in the meantime (at least on my archlinux system). This it integrates nicely with wabt_ - the swiss-army knife of Webassembly.
     19 
     20 So how far do we get with just clang_, LLVM_ and wabt_ ? Let's see if we at least can set up a code snippet which simply writes *"foobar"* to memory. The host will write *"foo"*, and :code:`wasm` will write *"bar"*.
     21 
     22 
     23 
     24 Without libc
     25 ============
     26 
     27 This excellent `tutorial by Surma <https://surma.dev/things/c-to-webassembly/>`_ provides a good starting point. Go ahead and **read that first**. This text is not a Webassembly primer, so the following will make a lot more sense if you do.
     28 
     29 That setup still adds some magic. Namely, the *memory* and *symbol table* are here added by the *wasm linker*. It would be even more fun to pass this from the host system instead.
     30 
     31 And so we start: [1]_
     32 
     33 .. include:: code/wasm-c-bare/bare.c
     34    :code: c
     35    :number-lines: 0
     36 
     37 Compiling this without linking gives us a hint on what needs to be defined.
     38 
     39 .. code-block:: bash
     40 
     41         $ clang --target=wasm32 -nostdlib -nostartfiles -o bare.wasm -c bare.c
     42         $ wasm-objdump -x bare.wasm
     43         bare.wasm:	file format wasm 0x1
     44 
     45         Section Details:
     46 
     47         Type[2]:
     48          - type[0] () -> nil
     49          - type[1] (i32) -> nil
     50         Import[4]:
     51          - memory[0] pages: initial=0 <- env.__linear_memory
     52          - table[0] type=funcref initial=0 <- env.__indirect_function_table
     53          - global[0] i32 mutable=1 <- env.__stack_pointer
     54          - func[0] sig=1 <env.call_me_sometime> <- env.call_me_sometime
     55         Function[1]:
     56          - func[1] sig=0 <foo>
     57         Code[1]:
     58          - func[1] size=138 <foo>
     59         Custom:
     60          - name: "linking"
     61           - symbol table [count=4]
     62            - 0: F <foo> func=1 binding=global vis=hidden
     63            - 1: G <env.__stack_pointer> global=0 undefined binding=global vis=default
     64            - 2: D <__heap_base> undefined binding=global vis=default
     65            - 3: F <env.call_me_sometime> func=0 undefined binding=global vis=default
     66         Custom:
     67          - name: "reloc.CODE"
     68           - relocations for section: 3 (Code) [5]
     69            - R_wasm_GLOBAL_INDEX_LEB offset=0x000007(file=0x000099) symbol=1 <env.__stack_pointer>
     70            - R_wasm_GLOBAL_INDEX_LEB offset=0x00001c(file=0x0000ae) symbol=1 <env.__stack_pointer>
     71            - R_wasm_MEMORY_ADDR_SLEB offset=0x000031(file=0x0000c3) symbol=2 <__heap_base>
     72            - R_wasm_FUNCTION_INDEX_LEB offset=0x000073(file=0x000105) symbol=3 <env.call_me_sometime>
     73            - R_wasm_GLOBAL_INDEX_LEB offset=0x000086(file=0x000118) symbol=1 <env.__stack_pointer>
     74         Custom:
     75          - name: "producers"
     76 
     77 Using nodejs as the host, we check if we can instantiate a :code:`WebAssembly` object
     78 
     79 .. include:: code/wasm-c-bare/bare_naive.js
     80    :code: javascript
     81    :number-lines: 0
     82 
     83 Running this tells us we are apparently missing a property :code:`env` in the imports object.
     84 
     85 .. code-block:: bash
     86 
     87         $ node bare_naive.js 
     88         /home/lash/src/tests/wasm/bare/bare_naive.js:8
     89         const i = new WebAssembly.Instance(m, imports);
     90                   ^
     91 
     92         TypeError: WebAssembly.Instance(): Import #0 module="env" error: module is not an object or function
     93 
     94 That seems to match with the :code:`Import` section in the :code:`objdump` output above. Let's stick the *memory* and *table* in there. [2]_
     95 
     96 And let's make a bold guess that the callback function :code:`call_me_sometime` needs to go in there aswell.
     97 
     98 .. include:: code/wasm-c-bare/bare.js
     99    :code: javascript
    100    :number-lines: 0
    101 
    102 The linker needs a little help from us for this:
    103 
    104 - Our callback function will not be available at link time, so we have to :code:`--allow-undefined` to promise that the host has got this covered.
    105 - :code:`--import-memory` and :code:`--import-table` to enable us to get memory and symbol table from the host.
    106 - :code:`--export="foo"` to make sure we only export exactly what we intend to from our :code:`wasm`.
    107 
    108 
    109 .. code-block:: bash
    110 
    111 	$ clang --target=wasm32 -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined  -o bare.wasm bare.c
    112 
    113 
    114 And that should give us:
    115 
    116 .. code-block:: bash
    117 
    118         $ node bare.js 
    119         heap is at: 66560
    120         heap contains: foobar
    121 
    122 This way of pointing to memory is of course grossly inadequate *and* unsafe *and* ridiculous for any purpose more advanced that this one. So some proper memory management would not be a bad thing.
    123 
    124 
    125 Adding libc
    126 ===========
    127 
    128 And what do you know. In other news since last time I looked at this is the addition of "a libc for WebAssembly programs built on top of WASI system calls." [wasi-libc]_. Let's see if we can add a slightly less manual way of handling memory with :code:`malloc` and :code:`memcpy`
    129 
    130 .. include:: code/wasm-c-bare/bare_libc.c
    131    :code: c
    132    :number-lines: 0
    133 
    134 As you see, we need a few more parameters for the compiler and linker at this point. The :code:`--target=wasm32-unknown-wasi --sysroot /opt/wasi-libc .. /opt/wasi-libc/lib/wasm32-wasi/libc.a` is needed to hook us up with headers and symbols for the libc.
    135 
    136 My archlinux puts that sysroot in :code:`/opt/wasi-libc`, that may of course not be the case elsewhere.
    137 
    138 .. code-block:: bash
    139 
    140 	$ clang -DHAVE_LIBC=1 --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined  -o bare.wasm bare.c /opt/wasi-libc/lib/wasm32-wasi/libc.a
    141         $ wasm-objdump -x bare.wasm
    142 
    143         bare.wasm:	file format wasm 0x1
    144 
    145         Section Details:
    146 
    147         Type[3]:
    148          - type[0] (i32) -> nil
    149          - type[1] () -> nil
    150          - type[2] (i32) -> i32
    151         Import[3]:
    152          - memory[0] pages: initial=2 <- env.memory
    153          - table[0] type=funcref initial=1 <- env.__indirect_function_table
    154          - func[0] sig=0 <call_me_sometime> <- env.call_me_sometime
    155         Function[7]:
    156          - func[1] sig=1 <foo>
    157          - func[2] sig=2 <malloc>
    158          - func[3] sig=2 <dlmalloc>
    159          - func[4] sig=0 <free>
    160          - func[5] sig=0 <dlfree>
    161          - func[6] sig=1 <abort>
    162          - func[7] sig=2 <sbrk>
    163         Global[2]:
    164          - global[0] i32 mutable=1 - init i32=67072
    165          - global[1] i32 mutable=0 <__heap_base> - init i32=67072
    166         Export[2]:
    167          - func[1] <foo> -> "foo"
    168          - global[1] -> "__heap_base"
    169         Code[7]:
    170          - func[1] size=171 <foo>
    171          - func[2] size=10 <malloc>
    172          - func[3] size=6984 <dlmalloc>
    173          - func[4] size=10 <free>
    174          - func[5] size=1908 <dlfree>
    175          - func[6] size=4 <abort>
    176          - func[7] size=78 <sbrk>
    177         Data[2]:
    178          - segment[0] memory=0 size=7 - init i32=1024
    179           - 0000400: 6261 7a62 6172 00                        bazbar.
    180          - segment[1] memory=0 size=500 - init i32=1032
    181           - 0000408: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    182           - 0000418: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    183           - 0000428: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    184           - 0000438: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    185           - 0000448: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    186           - 0000458: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    187           - 0000468: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    188           - 0000478: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    189           - 0000488: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    190           - 0000498: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    191           - 00004a8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    192           - 00004b8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    193           - 00004c8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    194           - 00004d8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    195           - 00004e8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    196           - 00004f8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    197           - 0000508: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    198           - 0000518: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    199           - 0000528: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    200           - 0000538: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    201           - 0000548: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    202           - 0000558: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    203           - 0000568: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    204           - 0000578: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    205           - 0000588: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    206           - 0000598: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    207           - 00005a8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    208           - 00005b8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    209           - 00005c8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    210           - 00005d8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    211           - 00005e8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    212           - 00005f8: 0000 0000                                ....
    213         Custom:
    214          - name: "name"
    215          - func[0] <call_me_sometime>
    216          - func[1] <foo>
    217          - func[2] <malloc>
    218          - func[3] <dlmalloc>
    219          - func[4] <free>
    220          - func[5] <dlfree>
    221          - func[6] <abort>
    222          - func[7] <sbrk>
    223         Custom:
    224          - name: "producers"
    225 
    226 What luxury. And of course, our :code:`bare.wasm` file just grew from 350 bytes to 10k...
    227 
    228 We don't have to change our :code:`javascript` code at this point. Simply run again, and get:
    229 
    230 .. code-block:: bash
    231 
    232         $ node bare.js 
    233         heap is at: 67088
    234         heap contains: foobazbar
    235 
    236 
    237 .. _Emscripten: https://emscripten.org/
    238 
    239 .. _LLVM: https://llvm.org/
    240 
    241 .. _clang: http://clang.org/
    242 
    243 .. _wabt: https://github.com/WebAssembly/wabt
    244 
    245 .. _libc for wasi: https://github.com/WebAssembly/wasi-libc
    246 
    247 ..
    248 
    249         .. [1] :code:`__heap_base` will be set by default by the wasm environment, and is thus available as an external symbol.
    250 
    251 ..
    252 
    253         .. [2] After linking the memory symbol meeds to be called :code:`memory` instead of :code:`__linear_memory` for some reason. Thus we add both here for clarity.
    254 
    255 
    256 ..
    257 
    258         .. [wasi-libc] https://github.com/WebAssembly/wasi-libc