339: Implement modpow() for BigUint backed by Montgomery Multiplication r=cuviper a=str4d
Based on this Gist: https://gist.github.com/yshui/027eecdf95248ea69606
Also adds support to `BigUint.from_str_radix()` for using `_` as a visual separator.
Closes#136
328: Optimizing BigUint and Bigint multiplication with the Toom-3 algorithm r=cuviper a=kompass
Hi !
I finally implemented the Toom-3 algorithm ! I first tried to minimize the memory allocations by allocating the `Vec<BigDigit>` myself, as was done for Toom-2, but Toom-3 needs more complex calculations, with negative numbers. So I gave up this method, to use `BigInt` directly, and it's already faster ! I also chose a better threshold for the Toom-2 algorithm.
Before any modification :
```
running 4 tests
test multiply_0 ... bench: 257 ns/iter (+/- 25)
test multiply_1 ... bench: 30,240 ns/iter (+/- 1,651)
test multiply_2 ... bench: 2,752,360 ns/iter (+/- 52,102)
test multiply_3 ... bench: 11,618,575 ns/iter (+/- 266,286)
```
With a better Toom-2 threshold (16 instead of 4) :
```
running 4 tests
test multiply_0 ... bench: 130 ns/iter (+/- 8)
test multiply_1 ... bench: 19,772 ns/iter (+/- 1,083)
test multiply_2 ... bench: 1,340,644 ns/iter (+/- 17,987)
test multiply_3 ... bench: 7,302,854 ns/iter (+/- 82,060)
```
With the Toom-3 algorithm (with a threshold of 300):
```
running 4 tests
test multiply_0 ... bench: 123 ns/iter (+/- 3)
test multiply_1 ... bench: 19,689 ns/iter (+/- 837)
test multiply_2 ... bench: 1,189,589 ns/iter (+/- 29,101)
test multiply_3 ... bench: 3,014,225 ns/iter (+/- 61,222)
```
I think this could be optimized, but it's a first step !
By starting with `split_at_mut`, the hot multiplication loop runs with
no bounds checking at all! The remaining carry loop has a slightly
simpler check for when the remaining iterator runs dry.
313: Scalar operations across all integer types r=cuviper
With my apologies for opening a new PR, and also for the 8 month delay, this continues the work started in #237 - the discussion there outlines the goals I was aiming for. I suppose this supersedes that PR and the other one can now be closed.
This PR adds support for Add, Sub, Mul, Div and Rem operations involving one BigInt/BigUint and one primitive integer, with operands in either order, and any combination of owned/borrowed arguments.