283 lines
6.0 KiB
Markdown
283 lines
6.0 KiB
Markdown
---
|
|
title: FFI-ing Go from Nim for Fun and Profit
|
|
date: 2015-12-20
|
|
series: howto
|
|
tags:
|
|
- go
|
|
- nim
|
|
---
|
|
|
|
As a side effect of Go 1.5, the compiler and runtime recently gained the
|
|
ability to compile code and run it as FFI code running in a C namespace. This
|
|
means that you can take any Go function that expresses its types and the like
|
|
as something compatible with C and use it from C, Haskell, Nim, Luajit, Python,
|
|
anywhere. There are some unique benefits and disadvantages to this however.
|
|
|
|
A Simple Example
|
|
----------------
|
|
|
|
Consider the following Go file `add.go`:
|
|
|
|
```go
|
|
package main
|
|
|
|
import "C"
|
|
|
|
//export add
|
|
func add(a, b int) int {
|
|
return a + b
|
|
}
|
|
|
|
func main() {}
|
|
```
|
|
|
|
This just exposes a function `add` that takes some pair of C integers and then
|
|
returns their sum.
|
|
|
|
We can build it with:
|
|
|
|
```
|
|
$ go build -buildmode=c-shared -o libsum.so add.go
|
|
```
|
|
|
|
And then test it like this:
|
|
|
|
```
|
|
$ python
|
|
>>> from ctypes import cdll
|
|
>>> a = cdll.LoadLibrary("./libsum.so")
|
|
>>> print a.add(4,5)
|
|
9
|
|
```
|
|
|
|
And there we go, a Go function exposed and usable in Python. However now we
|
|
need to consider the overhead when switching contexts from your app to your Go
|
|
code. To minimize context switches, I am going to write the rest of the code in
|
|
this post in [Nim](http://nim-lang.org) because it natively compiles down to
|
|
C and has some of the best C FFI I have used.
|
|
|
|
We can now define `libsum.nim` as:
|
|
|
|
```
|
|
proc add*(a, b: cint): cint {.importc, dynlib: "./libsum.so", noSideEffect.}
|
|
|
|
when isMainModule:
|
|
echo add(4,5)
|
|
```
|
|
|
|
Which when ran:
|
|
|
|
```
|
|
$ nim c -r libsum
|
|
Hint: system [Processing]
|
|
Hint: libsum [Processing]
|
|
CC: libsum
|
|
CC: system
|
|
Hint: [Link]
|
|
Hint: operation successful (9859 lines compiled; 1.650 sec total; 14.148MB; Debug Build) [SuccessX]
|
|
9
|
|
```
|
|
|
|
Good, we can consistently add `4` and `5` and get `9` back.
|
|
|
|
Now we can benchmark this by using the `times.cpuTime()` proc:
|
|
|
|
```
|
|
# test.nim
|
|
|
|
import
|
|
times,
|
|
libsum
|
|
|
|
let beginning = cpuTime()
|
|
|
|
echo "Starting Go FFI at " & $beginning
|
|
|
|
for i in countup(1, 100_000):
|
|
let myi = i.cint
|
|
discard libsum.add(myi, myi)
|
|
|
|
let endTime = cpuTime()
|
|
|
|
echo "Ended at " & $endTime
|
|
echo "Total: " & $(endTime - beginning)
|
|
```
|
|
|
|
```
|
|
$ nim c -r test
|
|
Hint: system [Processing]
|
|
Hint: test [Processing]
|
|
Hint: times [Processing]
|
|
Hint: strutils [Processing]
|
|
Hint: parseutils [Processing]
|
|
Hint: libsum [Processing]
|
|
CC: test
|
|
CC: system
|
|
CC: times
|
|
CC: strutils
|
|
CC: parseutils
|
|
CC: libsum
|
|
Hint: [Link]
|
|
Hint: operation successful (13455 lines compiled; 1.384 sec total; 21.220MB; Debug Build) [SuccessX]
|
|
Starting Go FFI at 0.000845
|
|
Ended at 0.131602
|
|
Total: 0.130757
|
|
```
|
|
|
|
Yikes. This takes 0.13 seconds to do the actual computation of every number
|
|
i in the range of `0` through `100,000`. I ran this for a few hundred times and
|
|
found out that it was actually consistently scoring between `0.12` and `0.2`
|
|
seconds. Obviously this cannot be a universal hammer and the FFI is very
|
|
expensive.
|
|
|
|
For comparison, consider the following C library code:
|
|
|
|
```
|
|
// libcsum.c
|
|
#include "libcsum.h"
|
|
|
|
int add(int a, int b) {
|
|
return a+b;
|
|
}
|
|
```
|
|
|
|
```
|
|
// libcsum.h
|
|
extern int add(int a, int b);
|
|
```
|
|
|
|
```
|
|
# libcsum.nim
|
|
proc add*(a, b: cint): cint {.importc, dynlib: "./libcsum.so", noSideEffect.}
|
|
|
|
when isMainModule:
|
|
echo add(4, 5)
|
|
```
|
|
|
|
and then have `test.nim` use the C library for comparison:
|
|
|
|
```
|
|
# test.nim
|
|
|
|
import
|
|
times,
|
|
libcsum,
|
|
libsum
|
|
|
|
let beginning = cpuTime()
|
|
|
|
echo "Starting Go FFI at " & $beginning
|
|
|
|
for i in countup(1, 100_000):
|
|
let myi = i.cint
|
|
discard libsum.add(myi, myi)
|
|
|
|
let endTime = cpuTime()
|
|
|
|
echo "Ended at " & $endTime
|
|
echo "Total: " & $(endTime - beginning)
|
|
|
|
let cpre = cpuTime()
|
|
echo "starting C FFI at " & $cpre
|
|
|
|
for i in countup(1, 100_000):
|
|
let myi = i.cint
|
|
discard libcsum.add(myi, myi)
|
|
|
|
let cpost = cpuTime()
|
|
|
|
echo "Ended at " & $cpost
|
|
echo "Total: " & $(cpost - cpre)
|
|
```
|
|
|
|
Then run it:
|
|
|
|
```
|
|
➜ nim c -r test
|
|
Hint: system [Processing]
|
|
Hint: test [Processing]
|
|
Hint: times [Processing]
|
|
Hint: strutils [Processing]
|
|
Hint: parseutils [Processing]
|
|
Hint: libcsum [Processing]
|
|
Hint: libsum [Processing]
|
|
CC: test
|
|
CC: system
|
|
CC: times
|
|
CC: strutils
|
|
CC: parseutils
|
|
CC: libcsum
|
|
CC: libsum
|
|
Hint: [Link]
|
|
Hint: operation successful (13455 lines compiled; 0.972 sec total; 21.220MB; Debug Build) [SuccessX]
|
|
Starting Go FFI at 0.00094
|
|
Ended at 0.119729
|
|
Total: 0.118789
|
|
|
|
starting C FFI at 0.119866
|
|
Ended at 0.12206
|
|
Total: 0.002194000000000002
|
|
```
|
|
|
|
Interesting. The Go library must be doing more per instance than just adding
|
|
the two numbers and continuing about. Since we have two near identical test
|
|
programs for each version of the library, let's `strace` it and see if there is
|
|
anything that can be optimized. [The Go one](https://gist.github.com/Xe/e0cd06d1d93e3299102e)
|
|
and [the C one](https://gist.github.com/Xe/7641cdba5657a4e8435a) are both very simple
|
|
and it looks like the Go runtime is adding the overhead.
|
|
|
|
Let's see what happens if we do that big loop in Go:
|
|
|
|
```
|
|
// add.go
|
|
|
|
//export addmanytimes
|
|
func addmanytimes() {
|
|
for i := 0; i < 100000; i++ {
|
|
add(i, i)
|
|
}
|
|
}
|
|
```
|
|
|
|
Then amend `libsum.nim` for this function:
|
|
|
|
```
|
|
proc addmanytimes*() {.importc, dynlib: "./libsum.so".}
|
|
```
|
|
|
|
And finally test it:
|
|
|
|
```
|
|
# test.nim
|
|
|
|
echo "Doing the entire loop in Go. Starting at " & $beforeGo
|
|
|
|
libsum.addmanytimes()
|
|
|
|
let afterGo = cpuTime()
|
|
|
|
echo "Ended at " & $afterGo
|
|
echo "Total: " & $(afterGo - beforeGo) & " seconds"
|
|
```
|
|
|
|
Which yields:
|
|
|
|
```
|
|
Doing the entire loop in Go. Starting at 0.119757
|
|
Ended at 0.119846
|
|
Total: 8.899999999999186e-05 seconds
|
|
```
|
|
|
|
Porting the C library to have a similar function would likely yield similar
|
|
results, as would putting the entire loop inside Nim. Even though this trick
|
|
was only demonstrated with Nim and Python, it will work with nearly any
|
|
language that can convert to/from C types for FFI. Given the large number of
|
|
languages that do have such an interface though, it seems unlikely that there
|
|
will be any language in common use that you *cannot* write to bind to Go code.
|
|
Just be careful and offload as much of it as you can to Go. The FFI barrier
|
|
**really hurts**.
|
|
|
|
---
|
|
|
|
This post's code is available [here](https://github.com/Xe/code/tree/master/experiments/go-nim).
|