nim-wiki/Destructors.rest

This page is a follow-up of https://nim-lang.org/araq/destructors.html and further outlines of where Nim is heading in the future. (Did I hear anyone say "Nim v2"?)

Nim's strings and sequences should become "GC-free" implementations and are exemplary for how Nim's core should work. Strings and sequences are value-based that means ``=`` performs a copy (conceptually). In practice many copies can be optimized away (see my blog post). The "optimized" copy is called a "move" and is supported via the type bound operator ``=sink``.

Rewrite rules (simplified)
==========================

========     ====================            ===========================================================
Rule         Pattern                         Transformed into
========     ====================            ===========================================================
1            var x; stmts                    var x; try stmts finally: `=destroy`(x)
2            x = f()                         `=sink`(x, f())
3            x = lastReadOf z                `=sink`(x, z)
4            x = y                           `=`(x, y) # a copy
5            f(g())                          var tmp; `=sink`(tmp, g()); f(tmp); `=destroy`(tmp)
========     ====================            ===========================================================

Rule (5) can be optimized further to ``var tmp = bitwiseCopy(g()); f(tmp); =destroy(tmp)``.


Sink parameters
===============

A ``sink`` parameter conveys a transfer of ownership. The parameter will be *consumed*.

A ``sink`` parameter is internally **not** mapped to ``var``, instead the
usual "pass-by-copy" / "optimize to by-ref if more efficient" implementation
is used. However, similar rules apply -- you cannot pass a ``const`` to
a ``sink`` parameter.

A ``sink`` parameter **must** be **consumed** exactly once within the
proc's body. The compiler will use a dataflow analysis to prove this fact.
For a ``sink`` parameter called ``sp`` a **consume** looks like:

.. code-block:: nim

  proc consume(c: var Container; sp: sink T) =
    locationDerivedFrom(c) = sp

This assignment is mapped to the ``=sink`` operator.

A consume can also be forwarded, "pass sp to a different proc as a sink parameter":

.. code-block:: nim

  proc consume(c: var Container; sp: sink T) =
    c.takeAsSink(sp)


Use after consume
-----------------

Locations passed to a ``sink`` parameter are invalidated after the call
and the compiler tries to prove that it is not used again afterwards. For
local variables this is quite easy to prove:

.. code-block:: nim

  proc consume(c: var Container; element: sink T) =
    c[i] = element

  proc main() =
    var x = initT()
    for i in 0..3:
      container.consume(x) # Error: attempt to re-use already moved value 'x'

For arbitrary locations involving array accesses etc it is too hard to prove
it is not used afterwards. The compiler transforms ``takeAsSink(sp)`` into
``takeAsSink(sp); reset(sp)``. ``reset`` sets the value back into its default
value. For locals the ``reset`` can be optimized away (stores to a dead object),
for function calls there is no location to reset at all.

For a location that has had its value moved into a sink parameter no
destructor call needs to be injected. This is an important optimization
to keep the produced code small.


Sink for locals
---------------

``sink T`` is also a valid type for locals. For a variable ``v`` of
type ``sink T`` no destructor call is injected and it is statically
ensured that every code path leads to its consumption.


Lent type
---------

``proc p(x: sink T)`` means that the proc ``p`` takes ownership of ``x``.
To eliminate even more creation/copy <-> destruction pairs, a proc's return
type can be annotated as ``lent T``. This is useful for "getter" accessors
that seek to allow an immutable view into a container.

Like ``sink T`` ``lent T`` is a valid annotation for local variables too.
For a variable ``v`` of type ``lent T`` it is statically ensured that no code
path leads to its consumption, in other words that it must not escape its
local stack frame (either directly or indirectly via passing to a ``sink``
parameter). For ``v`` no destructor call is injected since it doesn't own
the object.

The ``sink`` and ``lent`` annotations allow us to remove most (if not all)
superfluous copies and destructions.

``lent T`` is like ``var T`` a hidden pointer that the compiler needs to prove that
it doesn't outlive its origin.


.. code-block:: nim

  type
    Tree = object
      kids: seq[Tree]

  proc construct(kids: sink seq[Tree]): Tree =
    result = Tree(kids: kids)
    # converted into:
    `=sink`(result.kids, kids)

  proc `[]`*(x: Tree; i: int): lent Tree =
    result = x.kids[i]
    # borrows from 'x', this is transformed into:
    result = addr x.kids[i]
    # This means 'lent' is like 'var T' a hidden pointer.
    # Unlike 'var' this cannot be used to mutate the object.

  iterator children*(t: Tree): lent Tree =
    for x in t.kids: yield x

  proc main =
    # everything turned into moves:
    let t = construct(@[construct(@[]), construct(@[])])
    echo t[0] # accessor does not copy the element!


``sink T`` and ``lent T`` introduce further rewrite rules but lead to more efficient code. Even better, these rules optimize away create/copy <-> destroy pairs and so can also make atomic reference counting more efficient by eliminating incref <-> decref pairs.


Rewrite rules (extended)
========================

========     ====================            ===========================================================
Rule         Pattern                         Transformed into
========     ====================            ===========================================================
1.1          var x: T; stmts                 var x: T; try stmts finally: `=destroy`(x)
1.2          var x: sink T; stmts            var x: sink T; stmts; ensureEmpty(x)
2            x = f()                         `=sink`(x, f())
3            x = lastReadOf z                `=sink`(x, z)
4.1          sinkParam = y                   `=sink`(sinkParam, y)
4.2          x = y                           `=`(x, y) # a copy
5.1          f_sink(g())                     f_sink(g())
5.2          f_sink(y)                       f_sink(y); reset(y)
                                             # 'reset(y)' for locals usually optimized away
5.3          f_noSink(g())                   var tmp = bitwiseCopy(g()); f(tmp); `=destroy`(tmp)
========     ====================            ===========================================================

``sink T`` also affects overloading resolution rules; by the time type checking is performed we have no control flow graph yet so the property ``lastReadOf z`` is not available. However, passing a call expression ``f()`` to a ``g`` taking a sink parameter is a syntactic property and so is available for overloading resolution. Thus I propose the following rule:

.. code-block:: nim

  proc add(c: var Container; x: T) # version A
  proc add(c: var Container; x: sink T) # version B

  var c: Container
  var x: T
  c.add x # calls version A
  c.add f() # calls version B
  # object construction counts as proc call:
  c.add T() # calls version B


Interactions with the GC
========================

The implementation of ``ref`` is likely to stay as it is today, a GC'ed pointer. But if the ``seq`` is not
baked by the GC how can ``ref seq[ref T]`` continue to work? The answer is yet another type bound operator
called ``=trace``. With ``=trace`` a container can tell the GC how to access its contents for a GC's
sweeping/tracing step:

.. code-block:: nim

  proc `=trace`[T](s: seq[T]; a: Allocator) =
    for i in 0 ..< s.len: `=trace`(s.data[i], a)

``=trace`` always takes a second parameter, an ``allocator``. The new ``seq`` and ``string`` implementations
are also based on allocators.


Allocators
==========

The current design for an allocator looks like this:

.. code-block:: nim

  type
    Allocator* {.inheritable.} = ptr object
      alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
      dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
      realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}

  var
    currentAllocator {.threadvar.}: Allocator

  proc getCurrentAllocator*(): Allocator =
    result = currentAllocator

  proc setCurrentAllocator*(a: Allocator) =
    currentAllocator = a

  proc alloc*(size: int): pointer =
    let a = getCurrentAllocator()
    result = a.alloc(a, size)

  proc dealloc*(p: pointer; size: int) =
    let a = getCurrentAllocator()
    a.dealloc(a, size)

  proc realloc*(p: pointer; oldSize, newSize: int): pointer =
    let a = getCurrentAllocator()
    result = a.realloc(a, oldSize, newSize)


Pluggable GC
============

To be written.