220 lines
9.4 KiB
ReStructuredText
220 lines
9.4 KiB
ReStructuredText
This page is a follow-up of https://nim-lang.org/araq/destructors.html and further outlines of where Nim is heading in the future. (Did I hear anyone say "Nim v2"?)
|
|
|
|
Nim's strings and sequences should become "GC-free" implementations and are exemplary for how Nim's core should work. Strings and sequences are value-based that means ``=`` performs a copy (conceptually). In practice many copies can be optimized away (see my blog post). The "optimized" copy is called a "move" and is supported via the type bound operator ``=sink``.
|
|
|
|
Rewrite rules (simplified)
|
|
==========================
|
|
|
|
======== ==================== ===========================================================
|
|
Rule Pattern Transformed into
|
|
======== ==================== ===========================================================
|
|
1 var x; stmts var x; try stmts finally: `=destroy`(x)
|
|
2 x = f() `=sink`(x, f())
|
|
3 x = lastReadOf z `=sink`(x, z)
|
|
4 x = y `=`(x, y) # a copy
|
|
5 f(g()) var tmp; `=sink`(tmp, g()); f(tmp); `=destroy`(tmp)
|
|
======== ==================== ===========================================================
|
|
|
|
Rule (5) can be optimized further to ``var tmp = bitwiseCopy(g()); f(tmp); =destroy(tmp)``.
|
|
|
|
|
|
Sink parameters
|
|
===============
|
|
|
|
A ``sink`` parameter conveys a transfer of ownership. The parameter will be *consumed*.
|
|
|
|
A ``sink`` parameter is internally **not** mapped to ``var``, instead the
|
|
usual "pass-by-copy" / "optimize to by-ref if more efficient" implementation
|
|
is used.
|
|
|
|
A ``sink`` parameter **must** be **consumed** exactly once within the
|
|
proc's body. The compiler will use a dataflow analysis to prove this fact.
|
|
For a ``sink`` parameter called ``sp`` a **consume** looks like:
|
|
|
|
.. code-block:: nim
|
|
|
|
proc consume(c: var Container; sp: sink T) =
|
|
locationDerivedFrom(c) = sp
|
|
|
|
This assignment is mapped to the ``=sink`` operator.
|
|
|
|
A consume can also be forwarded, "pass sp to a different proc as a sink parameter":
|
|
|
|
.. code-block:: nim
|
|
|
|
proc consume(c: var Container; sp: sink T) =
|
|
c.takeAsSink(sp)
|
|
|
|
|
|
Use after consume
|
|
-----------------
|
|
|
|
After having read https://codesynthesis.com/~boris/blog//2012/06/19/efficient-argument-passing-cxx11-part1/ I have changed my mind about how ``sink`` parameters need to work. ``sink`` parameters are purely an optimization
|
|
to eliminate copies (and destructions). Instead of doing the copy at ``location = sinkParam`` it's turned into a sink and then
|
|
*at* the callsite you can specify a *move* if it's not already an expression of the form ``lastReadOf(z)``.
|
|
|
|
This is much simpler than the original idea of introducing a "use after consume" error state that empties the container and would to lead to error prone code constructs just to save some object copies. It also implies we don't need yet another overloading disambiguation rule, a table's put proc can look like
|
|
|
|
.. code-block:: nim
|
|
|
|
proc put*(t: Table; key, value: sink string) = ...
|
|
|
|
With no need of further non-sink overloads.
|
|
|
|
For a location that has had its value moved into a sink parameter no
|
|
destructor call needs to be injected. This is an important optimization
|
|
to keep the produced code size small. There is a ``system.move`` proc that can be used to annotate the moves at callsite that can further eliminate copies.
|
|
|
|
|
|
Sink for locals
|
|
---------------
|
|
|
|
``sink T`` is also a valid type for locals. For a variable ``v`` of
|
|
type ``sink T`` no destructor call is injected and it is statically
|
|
ensured that every code path leads to its consumption.
|
|
|
|
|
|
Lent type
|
|
---------
|
|
|
|
``proc p(x: sink T)`` means that the proc ``p`` takes ownership of ``x``.
|
|
To eliminate even more creation/copy <-> destruction pairs, a proc's return
|
|
type can be annotated as ``lent T``. This is useful for "getter" accessors
|
|
that seek to allow an immutable view into a container.
|
|
|
|
Like ``sink T`` ``lent T`` is a valid annotation for local variables too.
|
|
For a variable ``v`` of type ``lent T`` it is statically ensured that no code
|
|
path leads to its consumption, in other words that it must not escape its
|
|
local stack frame (either directly or indirectly via passing to a ``sink``
|
|
parameter). For ``v`` no destructor call is injected since it doesn't own
|
|
the object.
|
|
|
|
The ``sink`` and ``lent`` annotations allow us to remove most (if not all)
|
|
superfluous copies and destructions.
|
|
|
|
``lent T`` is like ``var T`` a hidden pointer that the compiler needs to prove that
|
|
it doesn't outlive its origin.
|
|
|
|
|
|
.. code-block:: nim
|
|
|
|
type
|
|
Tree = object
|
|
kids: seq[Tree]
|
|
|
|
proc construct(kids: sink seq[Tree]): Tree =
|
|
result = Tree(kids: kids)
|
|
# converted into:
|
|
`=sink`(result.kids, kids)
|
|
|
|
proc `[]`*(x: Tree; i: int): lent Tree =
|
|
result = x.kids[i]
|
|
# borrows from 'x', this is transformed into:
|
|
result = addr x.kids[i]
|
|
# This means 'lent' is like 'var T' a hidden pointer.
|
|
# Unlike 'var' this cannot be used to mutate the object.
|
|
|
|
iterator children*(t: Tree): lent Tree =
|
|
for x in t.kids: yield x
|
|
|
|
proc main =
|
|
# everything turned into moves:
|
|
let t = construct(@[construct(@[]), construct(@[])])
|
|
echo t[0] # accessor does not copy the element!
|
|
|
|
|
|
``sink T`` and ``lent T`` introduce further rewrite rules but lead to more efficient code. Even better, these rules optimize away create/copy <-> destroy pairs and so can also make atomic reference counting more efficient by eliminating incref <-> decref pairs.
|
|
|
|
|
|
Rewrite rules (extended)
|
|
========================
|
|
|
|
======== ==================== ===========================================================
|
|
Rule Pattern Transformed into
|
|
======== ==================== ===========================================================
|
|
1.1 var x: T; stmts var x: T; try stmts finally: `=destroy`(x)
|
|
1.2 var x: sink T; stmts var x: sink T; stmts; ensureEmpty(x)
|
|
2 x = f() `=sink`(x, f())
|
|
3 x = lastReadOf z `=sink`(x, z)
|
|
4.1 sinkParam = y `=sink`(sinkParam, y)
|
|
4.2 x = y `=`(x, y) # a copy
|
|
5.1 f_sink(g()) f_sink(g())
|
|
5.2 f_sink(y) f_sink(copy y);
|
|
# copy unless we can see it's the last read
|
|
5.3 f_sink(move y) f_sink(y); reset(y) # explicit moves empties 'y'
|
|
5.4 f_noSink(g()) var tmp = bitwiseCopy(g()); f(tmp); `=destroy`(tmp)
|
|
======== ==================== ===========================================================
|
|
|
|
|
|
Flaw 1
|
|
======
|
|
|
|
A ``sink`` parameter cannot be passed to its destructor since the destructor takes a ``var T`` parameter and ``sink`` itself cannot be passed as ``var``.
|
|
|
|
**Solution**: The destructor call is done on a temporary location that was bitcopied from the ``sink`` parameter or conceptually via ``unsafeAddr``. **Proof** that this is safe: After the destruction the ``sink`` parameter won't be used again. At the callsite either a copy was passed to the ``sink`` parameter which can't be used again either or an explicit ``move`` was performed which resets the memory and ensures that the it won't be used afterwards too. (Maybe this indicates that the destructor should also be a ``sink`` parameter and the ``reset`` step usually done in the destructor can be done by the compiler if required.)
|
|
|
|
Flaw 2
|
|
======
|
|
|
|
An analysis like "every code path provable leads to the parameters consumption" is hard to pull off, especially in a language like Nim with exceptions.
|
|
|
|
**Solution**: The analysis can introduce a fallback path with hidden bool flags like ``if not flag: =destroy(sinkParam)``. Furthermore the compiler should probably get even smarter in its inference of ``raises: []``.
|
|
|
|
|
|
Interactions with the GC
|
|
========================
|
|
|
|
The implementation of ``ref`` is likely to stay as it is today, a GC'ed pointer. But if the ``seq`` is not
|
|
baked by the GC how can ``ref seq[ref T]`` continue to work? The answer is yet another type bound operator
|
|
called ``=trace``. With ``=trace`` a container can tell the GC how to access its contents for a GC's
|
|
sweeping/tracing step:
|
|
|
|
.. code-block:: nim
|
|
|
|
proc `=trace`[T](s: seq[T]; a: Allocator) =
|
|
for i in 0 ..< s.len: `=trace`(s.data[i], a)
|
|
|
|
``=trace`` always takes a second parameter, an ``allocator``. The new ``seq`` and ``string`` implementations
|
|
are also based on allocators.
|
|
|
|
|
|
Allocators
|
|
==========
|
|
|
|
The current design for an allocator looks like this:
|
|
|
|
.. code-block:: nim
|
|
|
|
type
|
|
Allocator* {.inheritable.} = ptr object
|
|
alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
|
|
dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
|
|
realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}
|
|
|
|
var
|
|
currentAllocator {.threadvar.}: Allocator
|
|
|
|
proc getCurrentAllocator*(): Allocator =
|
|
result = currentAllocator
|
|
|
|
proc setCurrentAllocator*(a: Allocator) =
|
|
currentAllocator = a
|
|
|
|
proc alloc*(size: int): pointer =
|
|
let a = getCurrentAllocator()
|
|
result = a.alloc(a, size)
|
|
|
|
proc dealloc*(p: pointer; size: int) =
|
|
let a = getCurrentAllocator()
|
|
a.dealloc(a, size)
|
|
|
|
proc realloc*(p: pointer; oldSize, newSize: int): pointer =
|
|
let a = getCurrentAllocator()
|
|
result = a.realloc(a, oldSize, newSize)
|
|
|
|
|
|
Pluggable GC
|
|
============
|
|
|
|
To be written.
|