nim-wiki/Destructors.rest

This page is a follow-up of https://nim-lang.org/araq/destructors.html and further outlines of where Nim is heading in the future. (Did I hear anyone say "Nim v2"?)

Nim's strings and sequences should become "GC-free" implementations and are exemplary for how Nim's core should work. Strings and sequences are value-based that means ``=`` performs a copy (conceptually). In practice many copies can be optimized away (see my blog post). The "optimized" copy is called a "move" and is supported via the type bound operator ``=sink``.

Rewrite rules (simplified)
==========================

========     ====================            ===========================================================
Rule         Pattern                         Transformed into
========     ====================            ===========================================================
1            var x; stmts                    var x; try stmts finally: `=destroy`(x)
2            x = f()                         `=sink`(x, f())
3            x = lastReadOf z                `=sink`(x, z)
4            x = y                           `=`(x, y) # a copy
5            f(g())                          var tmp; `=sink`(tmp, g()); f(tmp); `=destroy`(tmp)
========     ====================            ===========================================================

Rule (5) can be optimized further to ``var tmp = bitwiseCopy(g()); f(tmp); =destroy(tmp)``.


Sink parameters
===============

A ``sink`` parameter conveys a transfer of ownership. The parameter will be *consumed*.

A ``sink`` parameter is internally **not** mapped to ``var``, instead the
usual "pass-by-copy" / "optimize to by-ref if more efficient" implementation
is used.

A ``sink`` parameter **must** be **consumed** exactly once within the
proc's body. The compiler will use a dataflow analysis to prove this fact.
For a ``sink`` parameter called ``sp`` a **consume** looks like:

.. code-block:: nim

  proc consume(c: var Container; sp: sink T) =
    locationDerivedFrom(c) = sp

This assignment is mapped to the ``=sink`` operator.

A consume can also be forwarded, "pass sp to a different proc as a sink parameter":

.. code-block:: nim

  proc consume(c: var Container; sp: sink T) =
    c.takeAsSink(sp)


Use after consume
-----------------

After having read https://codesynthesis.com/~boris/blog//2012/06/19/efficient-argument-passing-cxx11-part1/ I have changed my mind about how ``sink`` parameters need to work. ``sink`` parameters are purely an optimization
to eliminate copies (and destructions). Instead of doing the copy at ``location = sinkParam`` it's turned into a sink and then
*at* the callsite you can specify a *move* if it's not already an expression of the form ``lastReadOf(z)``.

This is much simpler than the original idea of introducing a "use after consume" error state that empties the container and would to lead to error prone code constructs just to save some object copies. It also implies we don't need yet another overloading disambiguation rule, a table's put proc can look like

.. code-block:: nim

  proc put*(t: Table; key, value: sink string) = ...

With no need of further non-sink overloads.

For a location that has had its value moved into a sink parameter no
destructor call needs to be injected. This is an important optimization
to keep the produced code size small. There is a ``system.move`` proc that can be used to annotate the moves at callsite that can further eliminate copies.


Sink for locals
---------------

``sink T`` is also a valid type for locals. For a variable ``v`` of
type ``sink T`` no destructor call is injected and it is statically
ensured that every code path leads to its consumption.


Lent type
---------

``proc p(x: sink T)`` means that the proc ``p`` takes ownership of ``x``.
To eliminate even more creation/copy <-> destruction pairs, a proc's return
type can be annotated as ``lent T``. This is useful for "getter" accessors
that seek to allow an immutable view into a container.

Like ``sink T`` ``lent T`` is a valid annotation for local variables too.
For a variable ``v`` of type ``lent T`` it is statically ensured that no code
path leads to its consumption, in other words that it must not escape its
local stack frame (either directly or indirectly via passing to a ``sink``
parameter). For ``v`` no destructor call is injected since it doesn't own
the object.

The ``sink`` and ``lent`` annotations allow us to remove most (if not all)
superfluous copies and destructions.

``lent T`` is like ``var T`` a hidden pointer that the compiler needs to prove that
it doesn't outlive its origin.


.. code-block:: nim

  type
    Tree = object
      kids: seq[Tree]

  proc construct(kids: sink seq[Tree]): Tree =
    result = Tree(kids: kids)
    # converted into:
    `=sink`(result.kids, kids)

  proc `[]`*(x: Tree; i: int): lent Tree =
    result = x.kids[i]
    # borrows from 'x', this is transformed into:
    result = addr x.kids[i]
    # This means 'lent' is like 'var T' a hidden pointer.
    # Unlike 'var' this cannot be used to mutate the object.

  iterator children*(t: Tree): lent Tree =
    for x in t.kids: yield x

  proc main =
    # everything turned into moves:
    let t = construct(@[construct(@[]), construct(@[])])
    echo t[0] # accessor does not copy the element!


``sink T`` and ``lent T`` introduce further rewrite rules but lead to more efficient code. Even better, these rules optimize away create/copy <-> destroy pairs and so can also make atomic reference counting more efficient by eliminating incref <-> decref pairs.


Rewrite rules (extended)
========================

========     ====================            ===========================================================
Rule         Pattern                         Transformed into
========     ====================            ===========================================================
1.1          var x: T; stmts                 var x: T; try stmts finally: `=destroy`(x)
1.2          var x: sink T; stmts            var x: sink T; stmts; ensureEmpty(x)
2            x = f()                         `=sink`(x, f())
3            x = lastReadOf z                `=sink`(x, z)
4.1          sinkParam = y                   `=sink`(sinkParam, y)
4.2          x = y                           `=`(x, y) # a copy
5.1          f_sink(g())                     f_sink(g())
5.2          f_sink(y)                       f_sink(copy y);
                                             # copy unless we can see it's the last read
5.3          f_sink(move y)                  f_sink(y); reset(y) # explicit moves empties 'y'
5.4          f_noSink(g())                   var tmp = bitwiseCopy(g()); f(tmp); `=destroy`(tmp)
========     ====================            ===========================================================


Flaw 1
======

A ``sink`` parameter cannot be passed to its destructor since the destructor takes a ``var T`` parameter and ``sink`` itself cannot be passed as ``var``.

**Solution**: The destructor call is done on a temporary location that was bitcopied from the ``sink`` parameter or conceptually via ``unsafeAddr``. **Proof** that this is safe: After the destruction the ``sink`` parameter won't be used again. At the callsite either a copy was passed to the ``sink`` parameter which can't be used again either or an explicit ``move`` was performed which resets the memory and ensures that the it won't be used afterwards too. (Maybe this indicates that the destructor should also be a ``sink`` parameter and the ``reset`` step usually done in the destructor can be done by the compiler if required.)

Flaw 2
======

An analysis like "every code path provable leads to the parameters consumption" is hard to pull off, especially in a language like Nim with exceptions.

**Solution**: The analysis can introduce a fallback path with hidden bool flags like ``if not flag: =destroy(sinkParam)``. Furthermore the compiler should probably get even smarter in its inference of ``raises: []``.


Interactions with the GC
========================

The implementation of ``ref`` is likely to stay as it is today, a GC'ed pointer. But if the ``seq`` is not
baked by the GC how can ``ref seq[ref T]`` continue to work? The answer is yet another type bound operator
called ``=trace``. With ``=trace`` a container can tell the GC how to access its contents for a GC's
sweeping/tracing step:

.. code-block:: nim

  proc `=trace`[T](s: seq[T]; a: Allocator) =
    for i in 0 ..< s.len: `=trace`(s.data[i], a)

``=trace`` always takes a second parameter, an ``allocator``. The new ``seq`` and ``string`` implementations
are also based on allocators.


Allocators
==========

The current design for an allocator looks like this:

.. code-block:: nim

  type
    Allocator* {.inheritable.} = ptr object
      alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
      dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
      realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}

  var
    currentAllocator {.threadvar.}: Allocator

  proc getCurrentAllocator*(): Allocator =
    result = currentAllocator

  proc setCurrentAllocator*(a: Allocator) =
    currentAllocator = a

  proc alloc*(size: int): pointer =
    let a = getCurrentAllocator()
    result = a.alloc(a, size)

  proc dealloc*(p: pointer; size: int) =
    let a = getCurrentAllocator()
    a.dealloc(a, size)

  proc realloc*(p: pointer; oldSize, newSize: int): pointer =
    let a = getCurrentAllocator()
    result = a.realloc(a, oldSize, newSize)


Pluggable GC
============

To be written.