328: Optimizing BigUint and Bigint multiplication with the Toom-3 algorithm r=cuviper a=kompass
Hi !
I finally implemented the Toom-3 algorithm ! I first tried to minimize the memory allocations by allocating the `Vec<BigDigit>` myself, as was done for Toom-2, but Toom-3 needs more complex calculations, with negative numbers. So I gave up this method, to use `BigInt` directly, and it's already faster ! I also chose a better threshold for the Toom-2 algorithm.
Before any modification :
```
running 4 tests
test multiply_0 ... bench: 257 ns/iter (+/- 25)
test multiply_1 ... bench: 30,240 ns/iter (+/- 1,651)
test multiply_2 ... bench: 2,752,360 ns/iter (+/- 52,102)
test multiply_3 ... bench: 11,618,575 ns/iter (+/- 266,286)
```
With a better Toom-2 threshold (16 instead of 4) :
```
running 4 tests
test multiply_0 ... bench: 130 ns/iter (+/- 8)
test multiply_1 ... bench: 19,772 ns/iter (+/- 1,083)
test multiply_2 ... bench: 1,340,644 ns/iter (+/- 17,987)
test multiply_3 ... bench: 7,302,854 ns/iter (+/- 82,060)
```
With the Toom-3 algorithm (with a threshold of 300):
```
running 4 tests
test multiply_0 ... bench: 123 ns/iter (+/- 3)
test multiply_1 ... bench: 19,689 ns/iter (+/- 837)
test multiply_2 ... bench: 1,189,589 ns/iter (+/- 29,101)
test multiply_3 ... bench: 3,014,225 ns/iter (+/- 61,222)
```
I think this could be optimized, but it's a first step !