Discussion:
Feedback Wanted on Homegrown @nogc WriteLn Alternative
Kitt via Digitalmars-d
2014-09-26 23:55:58 UTC
Permalink
Large sections of my codebase require full control over when and
how the GC is run. The new @nogc (as well as -vgc switch) has
been amazingly helpful in developing safe code I can trust wont
cause the GC to run; however, there are a few aspects of D I
simply have to learn to work without in these cases. One good
example is WriteLn and it's cousins. To this end, I'm writing my
own @nogc version of WriteLn that uses the C printf and sprintf
underneath.

I'm not a coding expert by any stretch of the imagination, and
I'm even less of an expert when it comes to D; for this reason, I
wanted to post my implementation and get some feedback from
fellow D-velopers. I'm especially interested with adhering to
"phobos quality" code as much as possible, so feel free to be
very picky.




enum bool isScalarOrString(T) = isScalarType!T || isSomeString!T;

@nogc void Info(Types...)(Types arguments) if ( allSatisfy!(
isScalarOrString, Types ) )
{
const uint bufferSize = Types.length * 32;
char[bufferSize] output;
uint index;

try
{
foreach(argument; arguments)
{
static if(isIntegral!(typeof(argument))) {
index += sprintf(&output[index], "%i", argument);
}
static if(isBoolean!(typeof(argument))) {
if(argument) {
index += sprintf(&output[index], "%s", cast(const
char*)"true");
}
else {
index += sprintf(&output[index], "%s", cast(const
char*)"false");
}
}
static if(isFloatingPoint!(typeof(argument))) {
index += sprintf(&output[index], "%f", argument);
}
static if(isSomeChar!(typeof(argument))) {
index += sprintf(&output[index], "%c", argument);
}
static if(isSomeString!(typeof(argument))) {
index += sprintf(&output[index], "%.*s", argument.length,
argument.ptr);
}
}
}
catch(Error e)
{
// TODO: Better Error Handling w/ more detail
printf("%s", cast(const char*)"An Error has occured");
}
catch(Exception e)
{
// TODO: Better Exception handling w/ more detail
printf("%s", cast(const char*)"An Exception has occured");
}

}
H. S. Teoh via Digitalmars-d
2014-09-27 01:35:11 UTC
Permalink
Large sections of my codebase require full control over when and how
amazingly helpful in developing safe code I can trust wont cause the
GC to run; however, there are a few aspects of D I simply have to
learn to work without in these cases. One good example is WriteLn and
that uses the C printf and sprintf underneath.
Did the forum web interface cut off your post? I only see the definition
of Info and nothing else.

Anyway, what I have in mind is more to take the current Phobos
std.format (of which writeln is just a thin wrapper), stick @nogc on it,
and hack it until it either compiles with @nogc, or isolate the GC parts
such that most calls to std.format (via writeln) are @nogc except when
you actually pass it something that must allocate.


T
--
People walk. Computers run.
Andrej Mitrovic via Digitalmars-d
2014-09-27 03:58:53 UTC
Permalink
Post by H. S. Teoh via Digitalmars-d
Anyway, what I have in mind is more to take the current Phobos
I don't see how, unless you provide an overload taking a buffer.
H. S. Teoh via Digitalmars-d
2014-09-27 05:49:12 UTC
Permalink
Post by Andrej Mitrovic via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
Anyway, what I have in mind is more to take the current Phobos
GC parts
I don't see how, unless you provide an overload taking a buffer.
std.format.formattedWrite takes an output range. The user can pass in a
preallocated buffer that doesn't depend on the GC. The only remaining
question is whether formattedWrite currently has any GC-dependent parts,
and whether those parts are isolatable.

In fact, this might be a good opportunity to introduce a formatting
function that takes a compile-time format string. That would open up the
possibility of compile-time checking of format arguments (that
bearophile has been clamoring for), as well as truly minimal runtime
dependencies, where the only things that get included at runtime are the
pieces necessary to process that particular format string. So if none of
the required pieces are GC-dependent, the entire formatting call will be
@nogc (and the compiler would automatically infer this). Ditto with
pure, @safe, etc..

Today an idea occurred to me, that if we extend the current writefln
(and other similar formatting calls like std.string.format) to have this
signature:

void writefln(string ctFmt = "", A...)(A args);

then we can adopt the convention that if ctFmt is "", then args[0] will
be interpreted as the (runtime) format string, so existing code will
still work as before (due to IFTI inferring ctFmt as ""), but people can
start rewriting their formatting calls to:

writefln!"...format string here"(... /* arguments here */);

to take advantage of compile-time analysis and processing of the format
string.


T
--
The right half of the brain controls the left half of the body. This means that only left-handed people are in their right mind. -- Manoj Srivastava
Andrej Mitrovic via Digitalmars-d
2014-09-27 11:06:24 UTC
Permalink
Post by H. S. Teoh via Digitalmars-d
writefln!"...format string here"(... /* arguments here */);
Mmm, I like this. It would be one of those killer little features to
show in a talk. Slide 1:

// oops, forgot an %s
writefln("%s %s", 1, 2, 3);

Slide 2:

// Programmer error caught at compile-time!
writefln!("%s %s")(1, 2, 3);
H. S. Teoh via Digitalmars-d
2014-10-02 23:30:33 UTC
Permalink
Post by Andrej Mitrovic via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
writefln!"...format string here"(... /* arguments here */);
Mmm, I like this. It would be one of those killer little features to
// oops, forgot an %s
writefln("%s %s", 1, 2, 3);
// Programmer error caught at compile-time!
writefln!("%s %s")(1, 2, 3);
Alright, today I drafted up the following proof of concept:

private size_t numSpecs(string fmt)
{
size_t count = 0;

for (size_t i=0; i < fmt.length; i++)
{
if (fmt[i] == '%')
{
i++;
if (i < fmt.length && fmt[i] == '%')
continue;
count++;
}
}
return count;
}

void writef(string fmt="", Args...)(Args args)
{
import std.format : formattedWrite;
import std.stdio : stdout;

static if (fmt == "")
{
// fmt == "" means we're using a runtime-specified format string.
static if (args.length >= 1)
{
static if (is(typeof(args[0]) : string))
stdout.lockingTextWriter().formattedWrite(args[0], args[1..$]);
else
static assert(0, "Expecting format string as first argument");
}
}
else
{
// Compile-time specified format string: we can run sanity checks on
// it at compile-time.
enum count = numSpecs(fmt);
static assert(args.length == count,
"Format string requires " ~ count.stringof ~
" arguments but " ~ args.length.stringof ~
" are supplied");

// For now, we just forward it to the runtime format implementation.
// The eventual goal is to only depend on formatting functions that
// the format string actually uses, so that we can be @safe, nothrow,
// @nogc, pure, etc., as long as the dependent parts don't break any of
// those attributes.
stdout.lockingTextWriter().formattedWrite(fmt, args);
}
}

void writefln(string fmt="", Args...)(Args args)
{
writef!(fmt)(args);
writef!"\n";
}

void main()
{
//writefln!"Number: %d Tag: %s"(123, "mytag", 1);
writefln!"Number: %d Tag: %s"(123, "mytag");
}

If you uncomment the first line in main(), you'll get a nice
compile-time error telling you that the number of format specifiers and
the number of actual arguments passed don't match.

This is just a proof-of-concept, mind you; it doesn't actually parse the
format string correctly (it's just counting the number of unescaped %'s,
but that doesn't necessarily correspond with the number of arguments
needed, e.g., if you use "%*s" or "%(...%)").

But it *does* prove that it's possible to achieve compatibility with the
current way of invoking writefln, that is, if you write:

writefln("format %s string", "abc");

it will actually compile as before, and work as expected.

So this new syntax can be implemented alongside the existing syntax and
people can gradually migrate over from purely-runtime format strings to
compile-time, statically-checked format strings.


T
--
"Uhh, I'm still not here." -- KD, while "away" on ICQ.
bearophile via Digitalmars-d
2014-10-03 00:04:23 UTC
Permalink
Post by H. S. Teoh via Digitalmars-d
So this new syntax can be implemented alongside the existing
syntax and people can gradually migrate over from
purely-runtime format strings to compile-time,
statically-checked format strings.
Very good.

D has a static type system, unlike Python/Ruby/etc but D printing
functions use dynamic typing for format strings. This is just
wrong.

There is a template bloat problem, it's not a big problem, but
I'd like some way to use template arguments that are only used to
run compile-time functions to test them, and then leave zero
template bloat behind :-) This was one of the purposes of a "enum
precondition".

Bye,
bearophile
H. S. Teoh via Digitalmars-d
2014-10-03 01:55:41 UTC
Permalink
Post by bearophile via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
So this new syntax can be implemented alongside the existing syntax
and people can gradually migrate over from purely-runtime format
strings to compile-time, statically-checked format strings.
Very good.
D has a static type system, unlike Python/Ruby/etc but D printing
functions use dynamic typing for format strings. This is just wrong.
There is a template bloat problem, it's not a big problem, but I'd
like some way to use template arguments that are only used to run
compile-time functions to test them, and then leave zero template
bloat behind :-) This was one of the purposes of a "enum
precondition".
[...]

The way I envision it, the eventual implementation of formattedWrite
with CT format string will basically decompose every format string +
arguments call into a series of calls to individual formatting
functions. For example:

writefln!"You have %d items in mailbox %s"(n, mboxName);

would get translated at compile time into the equivalent of:

write("You have ");
formattedWrite("%d", n);
write(" items in mailbox ");
formattedWrite("%s", mboxName);

The idea being that a call like `formattedWrite("%d", n)` is far more
likely to be reused in other places in the program, than the specific
format string "You have %d items in mailbox %s" and that specific
combination of parameter types (int, string). So even though this does
incur some template bloat, it will hopefully break down the template
instantiations into smaller chunks that are frequently reused.


T
--
Lottery: tax on the stupid. -- Slashdotter
via Digitalmars-d
2014-10-03 08:34:25 UTC
Permalink
On Friday, 3 October 2014 at 01:57:37 UTC, H. S. Teoh via
Post by H. S. Teoh via Digitalmars-d
The idea being that a call like `formattedWrite("%d", n)` is
far more
likely to be reused in other places in the program, than the
specific
format string "You have %d items in mailbox %s" and that
specific
combination of parameter types (int, string). So even though
this does
incur some template bloat, it will hopefully break down the
template
instantiations into smaller chunks that are frequently reused.
Maybe you could do this as some kind of implicit range of
"formatting references" so that you can iterate over it in two
passes and save a heap allocation for situations where traversal
is cheap (fits in level 1 cache):

1. first pass: collect max buffer size
2. alloca buffer on stack
3. second pass: format into buffer

?
Joseph Rushton Wakeling via Digitalmars-d
2014-10-03 09:24:28 UTC
Permalink
On Friday, 3 October 2014 at 08:34:27 UTC, Ola Fosheim GrÞstad
Post by via Digitalmars-d
Maybe you could do this as some kind of implicit range of
"formatting references" so that you can iterate over it in two
passes and save a heap allocation for situations where
1. first pass: collect max buffer size
2. alloca buffer on stack
3. second pass: format into buffer
?
Won't that potentially fail based on the question of what the
input is? What if you're calling

newWritefln("A random number: %g", rndGen);

or,

newWriteln("An input range: ",
someNonDeterministicInputRange);
via Digitalmars-d
2014-10-03 09:33:23 UTC
Permalink
On Friday, 3 October 2014 at 09:24:30 UTC, Joseph Rushton
Post by Joseph Rushton Wakeling via Digitalmars-d
Won't that potentially fail based on the question of what the
input is? What if you're calling
Input has to be strictly pure. :)
monarch_dodra via Digitalmars-d
2014-10-03 11:15:28 UTC
Permalink
On Thursday, 2 October 2014 at 23:32:32 UTC, H. S. Teoh via
Post by H. S. Teoh via Digitalmars-d
[...]
writefln!"Number: %d Tag: %s"(123, "mytag");
I had (amongst with others) thought about the possibility of
"ct-write".

I think an even more powerful concept, would rather having a
"ct-fmt" object dirctly. Indeed writefln!"string" requires the
actual format at compile time, and for the write to be done.

It can't just validate that *any* arbitrary (but pre-defined)
string can be used with a certain set of write arguments.

I'm thinking:

//----
//Define several format strings.
auto english = ctFmt!"Today is %1$s %2$s";
auto french = ctFmt!"Nous sommes le %2$s %1$s";

//Verify homogeneity.
static assert(is(typeof(english) == typeof(french)));

//Chose your format.
auto myFmt = doEnglish ? english : french;

//Benefit.
writfln(myFmt, Month.oct, 3);
//----

I think this is particularly relevant in that it is these kinds
of cases that are particularly tricky and easy to get wrong.



For "basic" usage, you'd just use:
writefln(ctFmt!"Number: %d Tag: %s", 123, "mytag");

The hard part is finding the sweet spot in runtime/compile time
data, to make those format strings runtime-type compatible. But
it should be fairly doable.
H. S. Teoh via Digitalmars-d
2014-10-03 16:59:48 UTC
Permalink
On Thursday, 2 October 2014 at 23:32:32 UTC, H. S. Teoh via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
[...]
writefln!"Number: %d Tag: %s"(123, "mytag");
I had (amongst with others) thought about the possibility of
"ct-write".
I think an even more powerful concept, would rather having a "ct-fmt"
object dirctly. Indeed writefln!"string" requires the actual format at
compile time, and for the write to be done.
It can't just validate that *any* arbitrary (but pre-defined) string
can be used with a certain set of write arguments.
//----
//Define several format strings.
auto english = ctFmt!"Today is %1$s %2$s";
auto french = ctFmt!"Nous sommes le %2$s %1$s";
//Verify homogeneity.
static assert(is(typeof(english) == typeof(french)));
//Chose your format.
auto myFmt = doEnglish ? english : french;
//Benefit.
writfln(myFmt, Month.oct, 3);
//----
I think this is particularly relevant in that it is these kinds of cases
that are particularly tricky and easy to get wrong.
So ctFmt would have to be a static type that contains static information
about the number and types of formatting items it expects? Because
otherwise, we won't be able to do checks like verifying at compile-time
that the passed arguments match the given format.

But if we're going to go in this direction, I'd also introduce
named parameters instead of positional parameters, which would make
translators' jobs easier. For example:

ctFmt!"Today is %`day`s %`month`s"

is far easier to translate correctly than:

ctFmt!"Today is %1$s %2$s"

where the translator may have no idea what %1$s and %2$s are supposed to
refer to. For all they know, %1%s could be "our" and %2$s could be
"anniversary".
writefln(ctFmt!"Number: %d Tag: %s", 123, "mytag");
The hard part is finding the sweet spot in runtime/compile time data,
to make those format strings runtime-type compatible. But it should be
fairly doable.
Personally, I prefer the shorter syntax for the most usual cases where
the format string doesn't change:

writefln!"Number: %d Tag: %s"(123, "mytag");

But ctFmt could also fit under this scheme when more flexibility is
desired: we could pass it as a first parameter and leave the default CT
parameter as "" (meaning, read args[0] for format string). So if args[0]
is an instance of ctFmt, then we can do (more limited) compile-time
checking, and if it's a runtime string, then fallback to the current
behaviour.

For a compile-string that's statically fixed (i.e.,
writefln!"..."(...)), we can do a lot more than what ctFmt does. For
example, we can parse the format at compile-time to extract individual
formatting specifiers and intervening string fragments, and thereby
transform the entire writefln call into a series of puts() and
formattedWrite() calls.

With ctFmt, you can't extract the intervening string fragments
beforehand, and you'll need runtime binding of formatting specifiers to
arguments, because the exact format string chosen may vary at runtime,
though they *can* be statically checked to be compatible at compile-time
(so "X %1$s Y %2$s Z" is compatible with "P %2$s Q %1$s R", but "%d %d
%d" is not compatible with "%f %(%s%)" because they expect a different
number of arguments and argument types). So I see ctFmt as an object
that encapsulates the expected argument types, but leaves the actual
format string details to runtime, whereas passing in a string in the CT
argument of writefln will figure out the format string details at
compile-time, leaving only the actual formatting to be done at runtime.


T
--
Truth, Sir, is a cow which will give [skeptics] no more milk, and so
they are gone to milk the bull. -- Sam. Johnson
monarch_dodra via Digitalmars-d
2014-10-03 17:21:14 UTC
Permalink
On Friday, 3 October 2014 at 17:01:46 UTC, H. S. Teoh via
On Fri, Oct 03, 2014 at 11:15:28AM +0000, monarch_dodra via
Post by monarch_dodra via Digitalmars-d
On Thursday, 2 October 2014 at 23:32:32 UTC, H. S. Teoh via
Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
[...]
writefln!"Number: %d Tag: %s"(123, "mytag");
I had (amongst with others) thought about the possibility of
"ct-write".
I think an even more powerful concept, would rather having a
"ct-fmt"
object dirctly. Indeed writefln!"string" requires the actual
format at
compile time, and for the write to be done.
It can't just validate that *any* arbitrary (but pre-defined)
string
can be used with a certain set of write arguments.
//----
//Define several format strings.
auto english = ctFmt!"Today is %1$s %2$s";
auto french = ctFmt!"Nous sommes le %2$s %1$s";
//Verify homogeneity.
static assert(is(typeof(english) == typeof(french)));
//Chose your format.
auto myFmt = doEnglish ? english : french;
//Benefit.
writfln(myFmt, Month.oct, 3);
//----
I think this is particularly relevant in that it is these
kinds of cases
that are particularly tricky and easy to get wrong.
So ctFmt would have to be a static type that contains static
information
about the number and types of formatting items it expects?
Because
otherwise, we won't be able to do checks like verifying at
compile-time
that the passed arguments match the given format.
But if we're going to go in this direction, I'd also introduce
named parameters instead of positional parameters, which would
make
ctFmt!"Today is %`day`s %`month`s"
ctFmt!"Today is %1$s %2$s"
where the translator may have no idea what %1$s and %2$s are
supposed to
refer to. For all they know, %1%s could be "our" and %2$s could
be
"anniversary".
Right, but that would also require named parameter passing, which
we don't really have.
Post by monarch_dodra via Digitalmars-d
writefln(ctFmt!"Number: %d Tag: %s", 123, "mytag");
The hard part is finding the sweet spot in runtime/compile
time data,
to make those format strings runtime-type compatible. But it
should be
fairly doable.
Personally, I prefer the shorter syntax for the most usual
cases where
writefln!"Number: %d Tag: %s"(123, "mytag");
But ctFmt could also fit under this scheme when more
flexibility is
desired: we could pass it as a first parameter and leave the
default CT
parameter as "" (meaning, read args[0] for format string). So
if args[0]
is an instance of ctFmt, then we can do (more limited)
compile-time
checking, and if it's a runtime string, then fallback to the
current
behaviour.
Well, we could also simply have
writeln!str(args) => writefln(ctFmt!str, args)
For a compile-string that's statically fixed (i.e.,
writefln!"..."(...)), we can do a lot more than what ctFmt
does. For
example, we can parse the format at compile-time to extract
individual
formatting specifiers and intervening string fragments, and
thereby
transform the entire writefln call into a series of puts() and
formattedWrite() calls.
With ctFmt, you can't extract the intervening string fragments
beforehand, and you'll need runtime binding of formatting
specifiers to
arguments, because the exact format string chosen may vary at
runtime,
though they *can* be statically checked to be compatible at
compile-time
(so "X %1$s Y %2$s Z" is compatible with "P %2$s Q %1$s R", but
"%d %d
%d" is not compatible with "%f %(%s%)" because they expect a
different
number of arguments and argument types). So I see ctFmt as an
object
that encapsulates the expected argument types, but leaves the
actual
format string details to runtime, whereas passing in a string
in the CT
argument of writefln will figure out the format string details
at
compile-time, leaving only the actual formatting to be done at
runtime.
T
Well, technically, `ctFmt` could still do some formatting. It can
still cut up the format into an alternative series of strings and
"to format objects". ctFmt would still know how many string
fragments there are, and so would writeln. Writeln would still be
able to generate nothing more than "puts", the only difference is
that the actual string token is runtime defined, but I don't
think that makes any change.

Eg: ct!"hello %s World" becomes the type:
struct
{
//Actual contents run-time defined,
//but possibly pre-calculated during ctfe.
string[2] fixedStrings;

//Completely statically know.
enum string[1] fmt = [%s];
}

In particular, what "ctFmt" doesn't know, could still be
interpreted by writeln. For example, while "ctFmt" doesn't know
what "%s" binds to, it still statically knows it's "%s", and
writeln can statically extract that information.

The compile restriction we'd place on ctFmt would be that:
-The amount of "format objects" must be the same
-Each format objects must be the same at the same place, bar some
"positional shuffling".

So as I said (IMO), if we correctly define what ctformat string
are type-compatible, we can have pretty good run-time
possibilities, but still have 100% of the capabilities that
writeln!str(args) would give us.
monarch_dodra via Digitalmars-d
2014-10-03 17:23:30 UTC
Permalink
On Friday, 3 October 2014 at 17:01:46 UTC, H. S. Teoh via
Post by H. S. Teoh via Digitalmars-d
For a compile-string that's statically fixed (i.e.,
writefln!"..."(...)), we can do a lot more than what ctFmt
does. For
example, we can parse the format at compile-time to extract
individual
formatting specifiers and intervening string fragments, and
thereby
transform the entire writefln call into a series of puts() and
formattedWrite() calls.
The take home point is that ctFmt would *also* parse the string
at compile time. The difference is that it creates a run-time
object, which still contains enough static information for a
powerful write, yet still some run-time info to be able to swap
them at runtime.

Andrei Alexandrescu via Digitalmars-d
2014-09-27 04:31:14 UTC
Permalink
Post by Andrej Mitrovic via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
Anyway, what I have in mind is more to take the current Phobos
I don't see how, unless you provide an overload taking a buffer.
RCString. -- Andrei
Vladimir Panteleev via Digitalmars-d
2014-09-27 04:15:47 UTC
Permalink
On Saturday, 27 September 2014 at 01:37:05 UTC, H. S. Teoh via
Post by H. S. Teoh via Digitalmars-d
Did the forum web interface cut off your post?
No.
Andrei Alexandrescu via Digitalmars-d
2014-10-03 07:38:22 UTC
Permalink
Post by Andrej Mitrovic via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
writefln!"...format string here"(... /* arguments here */);
Mmm, I like this. It would be one of those killer little features to
// oops, forgot an %s
writefln("%s %s", 1, 2, 3);
// Programmer error caught at compile-time!
writefln!("%s %s")(1, 2, 3);
[snip]

Worth pursuing. Please submit an enh request and follow up there. Thanks!
-- Andrei
H. S. Teoh via Digitalmars-d
2014-10-03 14:18:50 UTC
Permalink
Post by Andrei Alexandrescu via Digitalmars-d
Post by Andrej Mitrovic via Digitalmars-d
Post by H. S. Teoh via Digitalmars-d
writefln!"...format string here"(... /* arguments here */);
Mmm, I like this. It would be one of those killer little features to
// oops, forgot an %s
writefln("%s %s", 1, 2, 3);
// Programmer error caught at compile-time!
writefln!("%s %s")(1, 2, 3);
[snip]
Worth pursuing. Please submit an enh request and follow up there. Thanks!
-- Andrei
Filed:

https://issues.dlang.org/show_bug.cgi?id=13568


T
--
I see that you JS got Bach.
Loading...