A Value Range Propagation usage example, and more

bearophile via Digitalmars-d

2014-10-08 12:17:35 UTC

This is the first part of a function to convert to base 58 (some
letters are missing, like the upper case "I") used in the Bitcoin
protocol:

alias Address = ubyte[1 + 4 + RIPEMD160_digest_len];

char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);

auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
c = c * 256 + ai;
ai = cast(ubyte)(c / symbols.length);
c %= symbols.length;
}
ri = symbols[c];
}
...
}

The D type system isn't smart enough to see that "ai" is always
fitting in an ubyte, so I have had to use a cast(ubyte). But
casts are dangerous and their usage should be minimized, and
to!ubyte is slow and makes the function not nothrow. So I've
rewritten the code like this with a bit of algebraic rewriting:

char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);

auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
immutable d = (c % symbols.length) * 256 + ai;
ai = d / symbols.length;
c = d;
}
ri = symbols[c % symbols.length];
}
...
}

Now it can be a little slower because the integer division and
modulus has different divisors, so perhaps they can't be
implemented with a little more than a single division, as before
(I have not compared the assembly), but for the purposes of this
code the performance difference is not a problem. Now the D type
system is able to see that "ai" is always fitting in a ubyte, and
there's no need for a cast. The compiler puts a safe implicit
cast. This is awesome.

- - - - - - - - - - - - - -

But of course you often want more :-)

This is another case where the current D type system allows you
to avoid a cast:

void main() {
char['z' - 'a' + 1] arr;

foreach (immutable i, ref c; arr)
c = 'a' + i;
}

But if you want to use ranges and functional UFCS chains you
currently need the cast:

void main() {
import std.range, std.algorithm, std.array;

char[26] arr = 26
.iota
.map!(i => cast(char)('a' + i))
.array;
}

In theory this program has the same compile-time information as
the foreach case. In practice foreach is a built-in that enjoys
more semantics than a iota+map.

Currently iota(26) loses the compile-time information about the
range, so you can't do (note: the "max" attribute doesn't exists):

void main() {
import std.range: iota;
auto r = iota(26);
enum ubyte m = r.max;
}

Currently the only way to keep that compile-time information is
to use a template argument:

void main() {
import std.range: tIota;
auto r = tIota!26;
enum ubyte m = r.max; // OK
}

But even if you write such tIota range, the map!() will lose the
compile-time value range information. And even if you manage to
write a map!() able to do it with template arguments, you have
template bloat.

So there's a desire to manage the compile-time information (like
value range information) of (number) literals without causing
template bloat and without the need to use explicit template
arguments.

Bye,
bearophile