bearophile via Digitalmars-d
2014-10-08 12:17:35 UTC
This is the first part of a function to convert to base 58 (some
letters are missing, like the upper case "I") used in the Bitcoin
protocol:
alias Address = ubyte[1 + 4 + RIPEMD160_digest_len];
char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);
auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
c = c * 256 + ai;
ai = cast(ubyte)(c / symbols.length);
c %= symbols.length;
}
ri = symbols[c];
}
...
}
The D type system isn't smart enough to see that "ai" is always
fitting in an ubyte, so I have had to use a cast(ubyte). But
casts are dangerous and their usage should be minimized, and
to!ubyte is slow and makes the function not nothrow. So I've
rewritten the code like this with a bit of algebraic rewriting:
char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);
auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
immutable d = (c % symbols.length) * 256 + ai;
ai = d / symbols.length;
c = d;
}
ri = symbols[c % symbols.length];
}
...
}
Now it can be a little slower because the integer division and
modulus has different divisors, so perhaps they can't be
implemented with a little more than a single division, as before
(I have not compared the assembly), but for the purposes of this
code the performance difference is not a problem. Now the D type
system is able to see that "ai" is always fitting in a ubyte, and
there's no need for a cast. The compiler puts a safe implicit
cast. This is awesome.
- - - - - - - - - - - - - -
But of course you often want more :-)
This is another case where the current D type system allows you
to avoid a cast:
void main() {
char['z' - 'a' + 1] arr;
foreach (immutable i, ref c; arr)
c = 'a' + i;
}
But if you want to use ranges and functional UFCS chains you
currently need the cast:
void main() {
import std.range, std.algorithm, std.array;
char[26] arr = 26
.iota
.map!(i => cast(char)('a' + i))
.array;
}
In theory this program has the same compile-time information as
the foreach case. In practice foreach is a built-in that enjoys
more semantics than a iota+map.
Currently iota(26) loses the compile-time information about the
range, so you can't do (note: the "max" attribute doesn't exists):
void main() {
import std.range: iota;
auto r = iota(26);
enum ubyte m = r.max;
}
Currently the only way to keep that compile-time information is
to use a template argument:
void main() {
import std.range: tIota;
auto r = tIota!26;
enum ubyte m = r.max; // OK
}
But even if you write such tIota range, the map!() will lose the
compile-time value range information. And even if you manage to
write a map!() able to do it with template arguments, you have
template bloat.
So there's a desire to manage the compile-time information (like
value range information) of (number) literals without causing
template bloat and without the need to use explicit template
arguments.
Bye,
bearophile
letters are missing, like the upper case "I") used in the Bitcoin
protocol:
alias Address = ubyte[1 + 4 + RIPEMD160_digest_len];
char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);
auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
c = c * 256 + ai;
ai = cast(ubyte)(c / symbols.length);
c %= symbols.length;
}
ri = symbols[c];
}
...
}
The D type system isn't smart enough to see that "ai" is always
fitting in an ubyte, so I have had to use a cast(ubyte). But
casts are dangerous and their usage should be minimized, and
to!ubyte is slow and makes the function not nothrow. So I've
rewritten the code like this with a bit of algebraic rewriting:
char[] toBase58(ref Address a) pure nothrow @safe {
static immutable symbols = "123456789" ~
"ABCDEFGHJKLMNPQRSTUVWXYZ" ~
"abcdefghijkmnopqrstuvwxyz";
static assert(symbols.length == 58);
auto result = new typeof(return)(34);
foreach_reverse (ref ri; result) {
uint c = 0;
foreach (ref ai; a) {
immutable d = (c % symbols.length) * 256 + ai;
ai = d / symbols.length;
c = d;
}
ri = symbols[c % symbols.length];
}
...
}
Now it can be a little slower because the integer division and
modulus has different divisors, so perhaps they can't be
implemented with a little more than a single division, as before
(I have not compared the assembly), but for the purposes of this
code the performance difference is not a problem. Now the D type
system is able to see that "ai" is always fitting in a ubyte, and
there's no need for a cast. The compiler puts a safe implicit
cast. This is awesome.
- - - - - - - - - - - - - -
But of course you often want more :-)
This is another case where the current D type system allows you
to avoid a cast:
void main() {
char['z' - 'a' + 1] arr;
foreach (immutable i, ref c; arr)
c = 'a' + i;
}
But if you want to use ranges and functional UFCS chains you
currently need the cast:
void main() {
import std.range, std.algorithm, std.array;
char[26] arr = 26
.iota
.map!(i => cast(char)('a' + i))
.array;
}
In theory this program has the same compile-time information as
the foreach case. In practice foreach is a built-in that enjoys
more semantics than a iota+map.
Currently iota(26) loses the compile-time information about the
range, so you can't do (note: the "max" attribute doesn't exists):
void main() {
import std.range: iota;
auto r = iota(26);
enum ubyte m = r.max;
}
Currently the only way to keep that compile-time information is
to use a template argument:
void main() {
import std.range: tIota;
auto r = tIota!26;
enum ubyte m = r.max; // OK
}
But even if you write such tIota range, the map!() will lose the
compile-time value range information. And even if you manage to
write a map!() able to do it with template arguments, you have
template bloat.
So there's a desire to manage the compile-time information (like
value range information) of (number) literals without causing
template bloat and without the need to use explicit template
arguments.
Bye,
bearophile