This page is a follow-up of https://nim-lang.org/araq/destructors.html and further outlines of where Nim is heading in the future. (Did I hear anyone say "Nim v2"?) Nim's strings and sequences should become "GC-free" implementations and are exemplary for how Nim's core should work. Strings and sequences are value-based that means ``=`` performs a copy (conceptually). In practice many copies can be optimized away (see my blog post). The "optimized" copy is called a "move" and is supported via the type bound operator ``=sink``. Rewrite rules (simplified) ========================== ======== ==================== =========================================================== Rule Pattern Transformed into ======== ==================== =========================================================== 1 var x; stmts var x; try stmts finally: `=destroy`(x) 2 x = f() `=sink`(x, f()) 3 x = lastReadOf z `=sink`(x, z) 4 x = y `=`(x, y) # a copy 5 f(g()) var tmp; `=sink`(tmp, g()); f(tmp); `=destroy`(tmp) ======== ==================== =========================================================== Rule (5) can be optimized further to ``var tmp = bitwiseCopy(g()); f(tmp); =destroy(tmp)``. Sink parameters =============== A ``sink`` parameter conveys a transfer of ownership. The parameter will be *consumed*. A ``sink`` parameter is internally **not** mapped to ``var``, instead the usual "pass-by-copy" / "optimize to by-ref if more efficient" implementation is used. However, similar rules apply -- you cannot pass a ``const`` to a ``sink`` parameter. A ``sink`` parameter **must** be **consumed** exactly once within the proc's body. The compiler will use a dataflow analysis to prove this fact. For a ``sink`` parameter called ``sp`` a **consume** looks like: .. code-block:: nim proc consume(c: var Container; sp: sink T) = locationDerivedFrom(c) = sp This assignment is mapped to the ``=sink`` operator. A consume can also be forwarded, "pass sp to a different proc as a sink parameter": .. code-block:: nim proc consume(c: var Container; sp: sink T) = c.takeAsSink(sp) Use after consume ----------------- Locations passed to a ``sink`` parameter are invalidated after the call and the compiler tries to prove that it is not used again afterwards. For local variables this is quite easy to prove: .. code-block:: nim proc consume(c: var Container; element: sink T) = c[i] = element proc main() = var x = initT() for i in 0..3: container.consume(x) # Error: attempt to re-use already moved value 'x' For arbitrary locations involving array accesses etc it is too hard to prove it is not used afterwards. The compiler transforms ``takeAsSink(sp)`` into ``takeAsSink(sp); reset(sp)``. ``reset`` sets the value back into its default value. For locals the ``reset`` can be optimized away (stores to a dead object), for function calls there is no location to reset at all. For a location that has had its value moved into a sink parameter no destructor call needs to be injected. This is an important optimization to keep the produced code small. Sink for locals --------------- ``sink T`` is also a valid type for locals. For a variable ``v`` of type ``sink T`` no destructor call is injected and it is statically ensured that every code path leads to its consumption. Lent type --------- ``proc p(x: sink T)`` means that the proc ``p`` takes ownership of ``x``. To eliminate even more creation/copy <-> destruction pairs, a proc's return type can be annotated as ``lent T``. This is useful for "getter" accessors that seek to allow an immutable view into a container. Like ``sink T`` ``lent T`` is a valid annotation for local variables too. For a variable ``v`` of type ``lent T`` it is statically ensured that no code path leads to its consumption, in other words that it must not escape its local stack frame (either directly or indirectly via passing to a ``sink`` parameter). For ``v`` no destructor call is injected since it doesn't own the object. The ``sink`` and ``lent`` annotations allow us to remove most (if not all) superfluous copies and destructions. ``lent T`` is like ``var T`` a hidden pointer that the compiler needs to prove that it doesn't outlive its origin. .. code-block:: nim type Tree = object kids: seq[Tree] proc construct(kids: sink seq[Tree]): Tree = result = Tree(kids: kids) # converted into: `=sink`(result.kids, kids) proc `[]`*(x: Tree; i: int): lent Tree = result = x.kids[i] # borrows from 'x', this is transformed into: result = addr x.kids[i] # This means 'lent' is like 'var T' a hidden pointer. # Unlike 'var' this cannot be used to mutate the object. iterator children*(t: Tree): lent Tree = for x in t.kids: yield x proc main = # everything turned into moves: let t = construct(@[construct(@[]), construct(@[])]) echo t[0] # accessor does not copy the element! ``sink T`` and ``lent T`` introduce further rewrite rules but lead to more efficient code. Even better, these rules optimize away create/copy <-> destroy pairs and so can also make atomic reference counting more efficient by eliminating incref <-> decref pairs. Rewrite rules (extended) ======================== ======== ==================== =========================================================== Rule Pattern Transformed into ======== ==================== =========================================================== 1.1 var x: T; stmts var x: T; try stmts finally: `=destroy`(x) 1.2 var x: sink T; stmts var x: sink T; stmts; ensureEmpty(x) 2 x = f() `=sink`(x, f()) 3 x = lastReadOf z `=sink`(x, z) 4.1 sinkParam = y `=sink`(sinkParam, y) 4.2 x = y `=`(x, y) # a copy 5.1 f_sink(g()) f_sink(g()) 5.2 f_sink(y) f_sink(y); reset(y) # 'reset(y)' for locals usually optimized away 5.3 f_noSink(g()) var tmp = bitwiseCopy(g()); f(tmp); `=destroy`(tmp) ======== ==================== =========================================================== ``sink T`` also affects overloading resolution rules; by the time type checking is performed we have no control flow graph yet so the property ``lastReadOf z`` is not available. However, passing a call expression ``f()`` to a ``g`` taking a sink parameter is a syntactic property and so is available for overloading resolution. Thus I propose the following rule: .. code-block:: nim proc add(c: var Container; x: T) # version A proc add(c: var Container; x: sink T) # version B var c: Container var x: T c.add x # calls version A c.add f() # calls version B # object construction counts as proc call: c.add T() # calls version B Interactions with the GC ======================== The implementation of ``ref`` is likely to stay as it is today, a GC'ed pointer. But if the ``seq`` is not baked by the GC how can ``ref seq[ref T]`` continue to work? The answer is yet another type bound operator called ``=trace``. With ``=trace`` a container can tell the GC how to access its contents for a GC's sweeping/tracing step: .. code-block:: nim proc `=trace`[T](s: seq[T]; a: Allocator) = for i in 0 ..< s.len: `=trace`(s.data[i], a) ``=trace`` always takes a second parameter, an ``allocator``. The new ``seq`` and ``string`` implementations are also based on allocators. Allocators ========== The current design for an allocator looks like this: .. code-block:: nim type Allocator* {.inheritable.} = ptr object alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.} dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.} realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.} var currentAllocator {.threadvar.}: Allocator proc getCurrentAllocator*(): Allocator = result = currentAllocator proc setCurrentAllocator*(a: Allocator) = currentAllocator = a proc alloc*(size: int): pointer = let a = getCurrentAllocator() result = a.alloc(a, size) proc dealloc*(p: pointer; size: int) = let a = getCurrentAllocator() a.dealloc(a, size) proc realloc*(p: pointer; oldSize, newSize: int): pointer = let a = getCurrentAllocator() result = a.realloc(a, oldSize, newSize) Pluggable GC ============ To be written.