Why are void[] contents marked as having pointers?

Post by Vladimir Panteleev
I don't know why it was decided to mark the contents of void[] as

[...]

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your
object (which you still plan to access later) to be stored in a
void[]-allocated array!

Rare or common, it still would be a nasty bug lurking to catch someone.
The default behavior in D should be to be correct code. Doing
potentially unsafe things to improve performance should require extra
effort - in this case it would be either using the gc function to mark
the memory as not containing pointers, or storing them as ubyte[] instead.

Vladimir Panteleev

2009-05-31 20:09:54 UTC

Post by Vladimir Panteleev
I don't know why it was decided to mark the contents of void[] as

[...]

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your
object (which you still plan to access later) to be stored in a
void[]-allocated array!

This isn't about performance, this is about having one thousand casts all over my code. It becomes a burden to cast everything to ubyte[] when working with abstract binary data. For example, when building a MIME multipart message with binary fields, every line needs to have a cast in it - when we could have just used the ~= operator to append to a void[].

Alternative solutions would be to have a second type (either new or one of the existing, e.g. ubyte[]) act as void[] (any array type casts to it implicitly) but not be scanned by the GC, but I doubt this is something you'll consider

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Andrei Alexandrescu

2009-05-31 20:24:09 UTC

On Sun, 31 May 2009 22:41:47 +0300, Walter Bright

Post by Vladimir Panteleev
I don't know why it was decided to mark the contents of void[] as

[...]

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your
object (which you still plan to access later) to be stored in a
void[]-allocated array!

Rare or common, it still would be a nasty bug lurking to catch
someone. The default behavior in D should be to be correct code.
Doing potentially unsafe things to improve performance should
require extra effort - in this case it would be either using the gc
function to mark the memory as not containing pointers, or storing
them as ubyte[] instead.

Another alternative would be to allow implicitly casting arrays of any
type to const(ubyte)[] which is always safe. But I think this is too
much ado about nothing - you're avoiding the type system to start with,
so use ubyte, insert a cast, and call it a day. If you have too many
casts, the problem is most likely elsewhere so that argument I'm not buying.

Andrei

BCS

2009-05-31 20:44:48 UTC

Post by Andrei Alexandrescu

Hello Andrei,

Post by Vladimir Panteleev
This isn't about performance, this is about having one thousand casts
all over my code. It becomes a burden to cast everything to ubyte[]
when working with abstract binary data. For example, when building a
MIME multipart message with binary fields, every line needs to have a
cast in it - when we could have just used the ~= operator to append
to a void[].

Another alternative would be to allow implicitly casting arrays of any
type to const(ubyte)[] which is always safe.

sounds like something that might work.

Post by Andrei Alexandrescu
But I think this is too
much ado about nothing - you're avoiding the type system to start with,

I'm not sure he is (or at least, he is in a very well defined way; "I need
to look at this data as its bytes")

Post by Andrei Alexandrescu
so use ubyte, insert a cast, and call it a day. If you have too
many casts, the problem is most likely elsewhere

You might be correct, but I don't think any of us have enough info right
now to make that assertion.

Andrei Alexandrescu

2009-05-31 21:00:45 UTC

Post by BCS

Post by Andrei Alexandrescu
so use ubyte, insert a cast, and call it a day. If you have too
many casts, the problem is most likely elsewhere

You might be correct, but I don't think any of us have enough info right
now to make that assertion.

Oh there is enough information. What's needed is:

const(ubyte)[] getRepresentation(T)(T[] data)
{
return cast(typeof(return)) data;
}

If you have many calls to getRepresentation(), then that
anticlimatically shows that you need to look at arrays' representations
often. If there are too many of those, maybe some of the said arrays
should be dealt with as ubyte[] in the first place.

Andrei

BCS

2009-05-31 21:22:04 UTC

Post by Andrei Alexandrescu

Hello Andrei,

Post by BCS

Post by Andrei Alexandrescu
so use ubyte, insert a cast, and call it a day. If you have too many
casts, the problem is most likely elsewhere

You might be correct, but I don't think any of us have enough info
right now to make that assertion.

const(ubyte)[] getRepresentation(T)(T[] data)
{
return cast(typeof(return)) data;
}
If you have many calls to getRepresentation(), then that
anticlimatically shows that you need to look at arrays'
representations often. If there are too many of those, maybe some of
the said arrays should be dealt with as ubyte[] in the first place.

Maybe in some cases but if the primary function of the code is processing
stuff between "raw data" and other data types than the above is irrelevant.
The OP sort of hinted somewhere that this is the kind of thing he is working
on. Without knowing what the OP is doing, I still don't think we can say
if his program is well designed.

Vladimir Panteleev

2009-05-31 21:32:42 UTC

Post by Andrei Alexandrescu
const(ubyte)[] getRepresentation(T)(T[] data)
{
return cast(typeof(return)) data;
}

This is functionally equivalent to (forgive the D1):
ubyte[] getRepresentation(void[] data)
{
return cast(ubyte[]) data;
}
Since no allocation is done in this case, the use of void[] is safe, and it doesn't instantiate a version of the function for every type you call it with. I remarked about this in my other reply.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Andrei Alexandrescu

2009-05-31 23:18:46 UTC

Post by Andrei Alexandrescu
const(ubyte)[] getRepresentation(T)(T[] data)
{
return cast(typeof(return)) data;
}

ubyte[] getRepresentation(void[] data)
{
return cast(ubyte[]) data;
}
Since no allocation is done in this case, the use of void[] is safe, and it doesn't instantiate a version of the function for every type you call it with. I remarked about this in my other reply.

This is not safe because you can change the data.

Andrei

Vladimir Panteleev

2009-06-01 11:12:40 UTC

On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu

Post by Andrei Alexandrescu
const(ubyte)[] getRepresentation(T)(T[] data)
{
return cast(typeof(return)) data;
}

ubyte[] getRepresentation(void[] data)
{
return cast(ubyte[]) data;
}
Since no allocation is done in this case, the use of void[] is safe,
and it doesn't instantiate a version of the function for every type you
call it with. I remarked about this in my other reply.

Which is why I wrote "forgive the D1" :)
I've yet to switch to D2, but it's obvious that the const should be there to ensure safety.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Vladimir Panteleev

2009-05-31 21:28:07 UTC

But I think this is too much ado about nothing - you're avoiding the type system to start with, so use ubyte, insert a cast, and call it a day.

I don't get it - not using casts is avoiding the type system? :P Note that I am NOT up-casting the void[] later back to some other type - it goes out to the network, a file, etc. void[] sounds like it fits perfectly in the type hierarchy for "just a bunch of bytes", except for the "may contain pointers" fine print.

If you have too many casts, the problem is most likely elsewhere so that argument I'm not buying.

I could cut down on the number of casts if I were to replace most array appending operations to calls to a function that takes a void[] and then internally casts to an ubyte[] and appends that somewhere. There's a lot of diversity of types being worked with in my case - strings, various structs, more raw data, etc. I'm more annoyed that I'd need to do something like that to work around a design decision that may not have been fully thought out.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Andrei Alexandrescu

2009-05-31 22:49:54 UTC

But I think this is too much ado about nothing - you're avoiding the type system to start with, so use ubyte, insert a cast, and call it a day.

If you have too many casts, the problem is most likely elsewhere so that argument I'm not buying.

Andrei Alexandrescu

2009-05-31 23:17:34 UTC

On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu

But I think this is too much ado about nothing - you're avoiding
the type system to start with, so use ubyte, insert a cast, and
call it a day.

I don't get it - not using casts is avoiding the type system? :P Note
that I am NOT up-casting the void[] later back to some other type -
it goes out to the network, a file, etc. void[] sounds like it fits
perfectly in the type hierarchy for "just a bunch of bytes", except
for the "may contain pointers" fine print.

I understand. You are sending around object representation. void[] may
contain pointers, so you're simply not looking at the right abstraction.

If you have too many casts, the problem is most likely elsewhere so
that argument I'm not buying.

I could cut down on the number of casts if I were to replace most
array appending operations to calls to a function that takes a void[]
and then internally casts to an ubyte[] and appends that somewhere.
There's a lot of diversity of types being worked with in my case -
strings, various structs, more raw data, etc. I'm more annoyed that
I'd need to do something like that to work around a design decision
that may not have been fully thought out.

Walter has written a class called OutBuffer (see std.outbuffer) the
likes of which could be used to encapsulate representation marshaling.

Andrei

Vladimir Panteleev

2009-06-01 07:03:48 UTC

Post by Andrei Alexandrescu
Another alternative would be to allow implicitly casting arrays of any
type to const(ubyte)[] which is always safe. But I think this is too
much ado about nothing - you're avoiding the type system to start with,
so use ubyte, insert a cast, and call it a day. If you have too many
casts, the problem is most likely elsewhere so that argument I'm not buying.

I've thought about this for a bit. If we allow any *non-reference* type except void[] to implicitly cast to ubyte[], but still allow implicitly casting ubyte[] to void[], it will put ubyte[] in the perfect spot in the type hierarchy - it'll allow safely (portability issues notwithstanding) getting the representation of value-type (POD) arrays, while still allowing abstracting it even further to the "might have pointers" type - at which point it is unsafe to access individual bytes, which void[] disallows without casts.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Vladimir Panteleev

2009-05-31 20:39:09 UTC

Post by Vladimir Panteleev
I don't know why it was decided to mark the contents of void[] as

[...]

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your
object (which you still plan to access later) to be stored in a
void[]-allocated array!

I just realized that by "performance" you might have meant memory leaks. Well, sure, if you can say that my programs crashing every few hours due to running out of memory is a "performance" problem. I'm sorry to sound bitter, but this was the cause of much annoyance for my software's users. It took me to write a memory debugger to understand that no matter how much you chase void[]s with hasNoPointers, there will always be that one ~ which you overlooked.

As much as I try to look from an objective perspective, I don't see how a memory leak (and memory leaks in D usually mean that NO memory is being freed, except for small lucky objects not having bogus pointers to them) is a problem less significant than an obscure case that involves allocating a void[], storing a pointer in it and losing all other references to the object. In fact, I just searched the D documentation and I couldn't find a statement saying whether void[] are scanned by the GC or not. Enter mr. D-newbie, who wants to write his own network/compression/file-copying/etc. library/program and stumbles upon void[], the seemingly perfect abstract-binary-data-container type for the job... (which is exactly what happened with yours truly).

P.S. Not trying to push my point of view, but just trying to offer some perspective from someone who has been bit by this design choice...

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Walter Bright

2009-05-31 21:28:21 UTC

Post by Vladimir Panteleev
I just realized that by "performance" you might have meant memory
leaks.

No, in this context I meant improving performance by not scanning the
void[] memory for pointers.

Post by Vladimir Panteleev
Well, sure, if you can say that my programs crashing every few
hours due to running out of memory is a "performance" problem. I'm
sorry to sound bitter, but this was the cause of much annoyance for
my software's users. It took me to write a memory debugger to
understand that no matter how much you chase void[]s with
hasNoPointers, there will always be that one ~ which you overlooked.

I'm curious what form of data you have that always seem to look like
valid pointers. There are a couple other options you can pursue - moving
the gc pool to another location in the address space, or changing the
alignment of your void[] data so it won't look like aligned pointers
(the gc won't look for misaligned pointers).

Or just use ubyte[] instead.

Post by Vladimir Panteleev
As much as I try to look from an objective perspective, I don't see
how a memory leak (and memory leaks in D usually mean that NO memory
is being freed, except for small lucky objects not having bogus
pointers to them) is a problem less significant than an obscure case
that involves allocating a void[], storing a pointer in it and losing
all other references to the object.

Because one is an obvious failure, and the other will be memory
corruption. Memory corruption is pernicious and awful.

Post by Vladimir Panteleev
In fact, I just searched the D
documentation and I couldn't find a statement saying whether void[]
are scanned by the GC or not. Enter mr. D-newbie, who wants to write
his own network/compression/file-copying/etc. library/program and
stumbles upon void[], the seemingly perfect
abstract-binary-data-container type for the job... (which is exactly
what happened with yours truly).
P.S. Not trying to push my point of view, but just trying to offer
some perspective from someone who has been bit by this design
choice...

Hmm. Wouldn't compression data be naturally a ubyte[] type?

BCS

2009-05-31 21:38:26 UTC

Hello Walter,

Post by Walter Bright
I'm curious what form of data you have that always seem to look like
valid pointers. There are a couple other options you can pursue -
moving the gc pool to another location in the address space, or
changing the alignment of your void[] data so it won't look like
aligned pointers (the gc won't look for misaligned pointers).

Most (but not all) of the cases I can think of where you get false pointers,
re-aligning stuff or moving the heap won't help as the false pointer source
will hit the full address space.

Vladimir Panteleev

2009-05-31 21:56:59 UTC

Post by Vladimir Panteleev
I just realized that by "performance" you might have meant memory
leaks.

No, in this context I meant improving performance by not scanning the
void[] memory for pointers.

It's just compressed data, which is evenly distributed across the 32-bit address space. Let's do the math:

Suppose we have an application which has two blocks of memory, M and N. Block M is a block with random data which is erroneously marked as having pointers, while block N is a block which shouldn't have any pointers towards it.
Now, the chance that a random DWORD will point inside N is sizeof(N)/0x100000000 - or rather, we can say that it will NOT point inside N with the probability of 1-(sizeof(N)/0x100000000). For as many DWORDs as there are in M, raise that to the power sizeof(M)/4. For values already as small as 1 MB for M and N, it's pretty much guaranteed that you'll have pointers inside N. Relocating or re-aligning the data won't help - it won't affect the entropy or the value range.

Post by Walter Bright
Or just use ubyte[] instead.

And the casts that come with it :(

Because one is an obvious failure, and the other will be memory
corruption. Memory corruption is pernicious and awful.

It is, yes. But if you add "don't put your only references inside void[]s" to the "don'ts" on the GC page, the programmer will only have himself to blame for not reading the language documentations. This goes right along with other tricks IMHO.

Hmm. Wouldn't compression data be naturally a ubyte[] type?

That's a subjective opinion :) I could just as well continue arguing that void[] is the perfect type for any kind of "opaque" binary data due to its properties.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Andrei Alexandrescu

2009-05-31 23:21:33 UTC

Post by Vladimir Panteleev
That's a subjective opinion :) I could just as well continue arguing
that void[] is the perfect type for any kind of "opaque" binary data
due to its properties.

To argue that convincingly, you'd need to disable conversions from
arrays of class objects to void[].

Andrei

Vladimir Panteleev

2009-06-01 06:27:24 UTC

Post by Andrei Alexandrescu
To argue that convincingly, you'd need to disable conversions from
arrays of class objects to void[].

You're right. Perhaps implicit cast of reference types to void[] should result in an error.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Daniel Keep

2009-06-01 06:43:19 UTC

Post by Andrei Alexandrescu
To argue that convincingly, you'd need to disable conversions from
arrays of class objects to void[].

You're right. Perhaps implicit cast of reference types to void[] should result in an error.

If only there were a way to indicate that void[]s could contain
pointers, then they would behave uniformly across types...

Oh wait.

Vladimir Panteleev

2009-05-31 21:59:10 UTC

Post by Walter Bright
Because one is an obvious failure, and the other will be memory
corruption. Memory corruption is pernicious and awful.

I wanted to add that debugging memory corruptions and other memory problems for D right now is complicated due to lack of proper tools in this area. Hopefully this will change in the near future.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Vladimir Panteleev

2009-05-31 22:03:08 UTC

Post by Walter Bright
Hmm. Wouldn't compression data be naturally a ubyte[] type?

(again, something I forgot to add... shouldn't hit Send so soon)

Consider this really basic example of file concatenation:

auto data = read("file1") ~ read("file2"); // oops! void[] concatenation - minefield created

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

bearophile

2009-05-31 22:34:21 UTC

Post by Vladimir Panteleev
auto data = read("file1") ~ read("file2"); // oops! void[] concatenation - minefield created

I think a better design for that read() function is to return ubyte[].
I have never understood why it returns a void[].
To manage generic data ubyte is better than void[] in your program (sometimes uint[] is useful to increase efficiency compared to ubyte[]).

Bye,
bearophile

Denis Koroskin

2009-05-31 19:54:30 UTC

FWIW, I also consider void[] as a storage for an arbitrary untyped binary data, and thus I believe GC shouldn't scan it.

While it is possible to prevent GC from scanning an arbitrary void[] array, there is no reasonable way to prevent it from scanning all arrays.

It is a breaking change, but may be changed for D2. In 99% it is a correct behavior (and a bug in a rest), but reduces application execution speed significantly.

++vote

Denis Koroskin

2009-05-31 19:58:38 UTC

FWIW, I also consider void[] as a storage for an arbitrary untyped binary data, and thus I believe GC shouldn't scan it.
Ignoring void[] arrays is a correct behavior in 99% of cases (and a bug in a rest), but improves application execution speed significantly.

While it is possible to prevent GC from scanning an arbitrary void[] array, there is no reasonable way to prevent it from scanning all arrays (without modifying GC code).

It is a breaking change, but not too late for D2.

++vote

Lionello Lunesu

2009-06-01 00:46:17 UTC

Post by Denis Koroskin

FWIW, I also consider void[] as a storage for an arbitrary untyped binary
data, and thus I believe GC shouldn't scan it.

You're contradicting yourself there. void[] is arbitrary untyped data,
so it could contain uints, floats, bytes, pointers, arrays, strings,
etc. or structs with any of those.

I think the current behavior is correct: ubyte[] is the new void*.

I also agree that std.file.read (and similar functions) should return
ubyte[] instead of void[], to prevent surprises after concatenation.

L.

Christopher Wright

2009-06-01 02:25:39 UTC

Post by Lionello Lunesu

On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev

FWIW, I also consider void[] as a storage for an arbitrary untyped binary
data, and thus I believe GC shouldn't scan it.

Even in C, people often use unsigned char* for arbitrary data that does
not include pointers.

grauzone

2009-05-31 20:11:57 UTC

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your object (which you still plan to access later) to be stored in a void[]-allocated array! Remember, the properties of memory regions are determined when the memory is allocated, so casting an array of structures to a void[] will not lose you that reference. You'd need to move your pointer to a void[]-array (which you need to allocate explicitly or, for example, concatenating your reference to the void[]), then drop the reference to your original structure, for this to happen.

void[] = can contain pointers
ubyte[] = can not contain pointers

void[] just wraps void*, which is a low level type and can contain
anything. Because of that, the conservative GC needs to scan it for
pointers. ubyte[], on the other hand, contains sequences of 8 bit
integers. For untyped binary data, ubyte[] is the most correct type.

You want to send it over network or write it into a file? Use ubyte[].
The data will never contain any pointers. You want to play low level
tricks, that involve copying around arbitrary memory contents (like
boxing, see std.boxer)? Use void[].

I think that's a good way to distinguish it.

You shouldn't cast structs or any other types to ubyte[], because the
memory representation of those type is highly platform specific. Structs
can contain padding, integers are endian dependend... If you want to
convert these to binary data, write a marshaller. You _never_ want to do
direct casts, because they're simply unportable. If you do the cast, you
have to know what you're doing.

BCS

2009-05-31 20:50:50 UTC

Hello grauzone,

Post by grauzone
You shouldn't cast structs or any other types to ubyte[], because the
memory representation of those type is highly platform specific.
Structs can contain padding, integers are endian dependend... If you
want to convert these to binary data, write a marshaller. You _never_
want to do direct casts, because they're simply unportable. If you do
the cast, you have to know what you're doing.

Never say never. Some cases like tmp files or whatnot where the same exe
will save and load the file never* have any need for potability.

*"never" uses intentionally :b.

Vladimir Panteleev

2009-05-31 21:14:26 UTC

Post by grauzone

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your object
(which you still plan to access later) to be stored in a
void[]-allocated array! Remember, the properties of memory regions are
determined when the memory is allocated, so casting an array of
structures to a void[] will not lose you that reference. You'd need to
move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the
void[]), then drop the reference to your original structure, for this
to happen.

std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to http://www.digitalmars.com/d/garbage.html adding that hiding your only reference in a void[] results in undefined behavior. I don't think this should be an inconvenience to any projects?

Post by grauzone
You shouldn't cast structs or any other types to ubyte[], because the
memory representation of those type is highly platform specific. Structs
can contain padding, integers are endian dependend... If you want to
convert these to binary data, write a marshaller. You _never_ want to do
direct casts, because they're simply unportable. If you do the cast, you
have to know what you're doing.

Thanks for the advice, but I actually know what I'm doing. Unlike C, D's structure alignment rules are actually part of the specification. If I wanted my programs to be safe/cross-platform/etc. regardless of execution speed, I'd use a scripting or VM-ed language.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Christopher Wright

2009-06-01 02:28:39 UTC

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to http://www.digitalmars.com/d/garbage.html adding that hiding your only reference in a void[] results in undefined behavior. I don't think this should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Vladimir Panteleev

2009-06-01 06:26:22 UTC

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your only
reference in a void[] results in undefined behavior. I don't think this
should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Christopher Wright

2009-06-01 11:10:57 UTC

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your only
reference in a void[] results in undefined behavior. I don't think this
should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is needed? Implementing support for scanning memory ranges for unaligned pointers will slow down the GC even more.

Because you can have a struct with align(1) that contains pointers. Then
these pointers can be unaligned. Then an array of those structs cast to
a void*[] would contain pointers, but as an optimization, the GC would
consider the pointers in this array aligned because you tell it they are.

Vladimir Panteleev

2009-06-01 11:14:07 UTC

On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your
only reference in a void[] results in undefined behavior. I don't
think this should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is needed?
Implementing support for scanning memory ranges for unaligned pointers
will slow down the GC even more.

The GC will not "see" unaligned pointers, regardless if they're in a struct or void[] array. The GC doesn't know the type of the data it's scanning - it just knows if it might contain pointers or it definitely doesn't contain pointers.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Christopher Wright

2009-06-01 22:01:00 UTC

On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your
only reference in a void[] results in undefined behavior. I don't
think this should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is needed?
Implementing support for scanning memory ranges for unaligned pointers
will slow down the GC even more.

Okay, so currently the GC doesn't do anything interesting with its type
information. You're suggesting that that be enforced and codified.

Vladimir Panteleev

2009-06-02 11:29:54 UTC

On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright

On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright

Post by Vladimir Panteleev
std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your
only reference in a void[] results in undefined behavior. I don't
think this should be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is
needed? Implementing support for scanning memory ranges for
unaligned pointers will slow down the GC even more.

Because you can have a struct with align(1) that contains pointers.
Then these pointers can be unaligned. Then an array of those structs
cast to a void*[] would contain pointers, but as an optimization, the
GC would consider the pointers in this array aligned because you tell
it they are.

The GC will not "see" unaligned pointers, regardless if they're in a
struct or void[] array. The GC doesn't know the type of the data it's
scanning - it just knows if it might contain pointers or it definitely
doesn't contain pointers.

Okay, so currently the GC doesn't do anything interesting with its type
information. You're suggesting that that be enforced and codified.

I wasn't suggesting any GC modifications, I was just suggesting that void[]'s TypeInfo "has pointers" flag be set to false.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Christopher Wright

2009-06-02 23:11:53 UTC

Post by Vladimir Panteleev
I wasn't suggesting any GC modifications, I was just suggesting that void[]'s TypeInfo "has pointers" flag be set to false.

The suggestion was that void[] be used as ubyte[] currently is, and then
to use void*[] to indicate an array of unknown type that may have pointers.

This works when all pointers are aligned, or when the garbage collector
does not optimize in cases where a type is known not to contain
unaligned pointers.

Alternatively, you can change the runtime to notify the GC on array
copies so it can keep track of type information when you're avoiding the
type system. But it's so easy to get around this by accident, it's not a
reasonable solution (even if it could be made fast).

Jarrett Billingsley

2009-06-03 00:24:16 UTC

Post by Vladimir Panteleev
I wasn't suggesting any GC modifications, I was just suggesting that
void[]'s TypeInfo "has pointers" flag be set to false.

The suggestion was that void[] be used as ubyte[] currently is, and then to
use void*[] to indicate an array of unknown type that may have pointers.

How do you have a void*[] point to a block of memory that is not a
multiple of (void*).sizeof?

Christopher Wright

2009-06-03 22:14:28 UTC

Post by Jarrett Billingsley

Post by Vladimir Panteleev
I wasn't suggesting any GC modifications, I was just suggesting that
void[]'s TypeInfo "has pointers" flag be set to false.

The suggestion was that void[] be used as ubyte[] currently is, and then to
use void*[] to indicate an array of unknown type that may have pointers.

How do you have a void*[] point to a block of memory that is not a
multiple of (void*).sizeof?

Another good point. Or how do you index it by byte?

bearophile

2009-06-03 23:19:29 UTC

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte.

Bye,
bearophile

Christopher Wright

2009-06-04 02:10:17 UTC

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o) I don't understand. I want to read and write files byte-by-byte.
Bye,
bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that you
use void*[] if you might include a pointer. So that use case would be safe.

Daniel Keep

2009-06-04 15:59:02 UTC

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long
chunks? :o) I don't understand. I want to read and write files
byte-by-byte.
Bye,
bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that you
use void*[] if you might include a pointer. So that use case would be safe.

How would you generically store the bits of this, then?

struct Gotcha { void* ptr; ubyte boo; }

Vladimir Panteleev

2009-06-04 18:16:42 UTC

On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long
chunks? :o) I don't understand. I want to read and write files
byte-by-byte.
Bye,
bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that you
use void*[] if you might include a pointer. So that use case would be safe.

Actually, I think Andrei's idea is better (to allow implicit casting
arrays of non-reference types to const(ubyte)[]). It introduces an
abstract no-pointers type, but still allows implicit casting to "might
have pointers".

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Denis Koroskin

2009-06-04 18:31:07 UTC

On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev

Post by Vladimir Panteleev
On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long
chunks? :o) I don't understand. I want to read and write files
byte-by-byte.
Bye,
bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that you
use void*[] if you might include a pointer. So that use case would be safe.

There is a pitfall: should an "arrays of non-reference types" be
implicitly castable to const(byte)[] or const(ubyte[])[] ?

Should const(byte)[] also be implicitly castable to const(ubyte)[] (or
vice versa)?

Vladimir Panteleev

2009-06-05 07:09:49 UTC

On Thu, 04 Jun 2009 21:31:07 +0300, Denis Koroskin <2korden at gmail.com>

Post by Denis Koroskin
On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev

Post by Vladimir Panteleev
On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright

Post by Christopher Wright
Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long
chunks? :o) I don't understand. I want to read and write files
byte-by-byte.
Bye,
bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that
you use void*[] if you might include a pointer. So that use case would
be safe.

There is a pitfall: should an "arrays of non-reference types" be
implicitly castable to const(byte)[] or const(ubyte[])[] ?
Should const(byte)[] also be implicitly castable to const(ubyte)[] (or
vice versa)?

I don't see why you'd want to work with arrays of signed bytes. It doesn't
make sense to allow implicit casting between the two; the programmer
should just pick one and stick with it. I think unsigned makes more sense.

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

BCS

2009-06-05 07:15:11 UTC

Hello Vladimir,

Post by Vladimir Panteleev
I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be in a
range like [-20,+20], for instance, delta of small integral value or golf
scores relative to par.

Vladimir Panteleev

2009-06-05 08:58:58 UTC

Post by BCS
Hello Vladimir,

Post by Vladimir Panteleev
I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be in a
range like [-20,+20], for instance, delta of small integral value or
golf scores relative to par.

Yes, but how is this related to abstracting data types to a generic type
that can be used for stuff like buffering or networking?

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

BCS

2009-06-05 17:16:08 UTC

Hello Vladimir,

Post by BCS
Hello Vladimir,

Post by Vladimir Panteleev
I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be
in a range like [-20,+20], for instance, delta of small integral
value or golf scores relative to par.

Yes, but how is this related to abstracting data types to a generic
type that can be used for stuff like buffering or networking?

It's not and that's the point. The point is there are uses for 8-bit signed
integer values other than as raw data. I might have read your comment out
of context but it seemed you were saying there is no use for the signed byte
type.

Vladimir Panteleev

2009-06-05 19:00:39 UTC

Post by BCS
Hello Vladimir,

Post by BCS
Hello Vladimir,

Post by Vladimir Panteleev
I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be
in a range like [-20,+20], for instance, delta of small integral
value or golf scores relative to par.

Yes, but how is this related to abstracting data types to a generic
type that can be used for stuff like buffering or networking?

It's not and that's the point. The point is there are uses for 8-bit
signed integer values other than as raw data. I might have read your
comment out of context but it seemed you were saying there is no use for
the signed byte type.

Oh yes; I was definitely not suggesting removing byte[] from the language.
<insidejoke namespace="#d">I'm sure he wouldn't be pleased one bit if we
did that! :P</insidejoke>

--
Best regards,
Vladimir mailto:thecybershadow at gmail.com

Derek Parnell

2009-06-05 23:36:29 UTC

Post by BCS
Hello Vladimir,

Post by Vladimir Panteleev
I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be in a
range like [-20,+20], for instance, delta of small integral value or golf
scores relative to par.

Or sound wave sample points [-127, 127]

--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

BCS

2009-05-31 20:53:02 UTC

Hello Vladimir,

Post by Vladimir Panteleev
I just went through a ~15000-line project and replaced most
occurrences of void[]. Now the project is an ugly mess of void[],
ubyte[] and casts, but at least it doesn't leak memory like crazy any
more.
I don't know why it was decided to mark the contents of void[] as
2) Despite that void[] is "typeless", you can still operate on it -
namely, slice and concatenate them. Pass a void[] to a network send()
function - how much did you send? Half the buffer? No problem, slice
it away and store the rest - and no casts.
3) It's very rare in practice that the only pointer to your object
(which you still plan to access later) to be stored in a
void[]-allocated array! Remember, the properties of memory regions are
determined when the memory is allocated, so casting an array of
structures to a void[] will not lose you that reference. You'd need to
move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the
void[]), then drop the reference to your original structure, for this
to happen.

I think the idea is that void[] is the most general data type; it can be
anything, including pointers.

Also for a real world use case where void[]=mightHavePointers is valid, consider
a system that reads blocks of data structures from a file and then does in
place substation from file references to memory references. You can't allocate
buffers of the correct type because you may not even know what that is until
you have already loaded the data.

Post by Vladimir Panteleev
void[] buffer;
void queue(void[] data)
{
buffer ~= data;
}
...
queue([1,2,3][]);
queue("Hello, World!");
No casts! So simple and beautiful. However, should you use this
pattern to work with larger amounts of data with a high entropy, the
"minefield" effect will cause the GC to stop collecting most data.
Sure, you can call std.gc.hasNoPointers, but you need to do it after
every single concatenation... and it makes expressions with more than
one concatenation unsafe.

Yes, when data is being copied into void[] from another type[] it is reasonable
to ignore pointers but as above, going the other way (IMHO the /common/ case)
it's not so easy.

Post by Vladimir Panteleev
I heard that Tango copies over the properties of arrays when they are
reallocated, which helps but solves the problem only partially.
So, I ask you: is there actually code out there that depends on the
way void[] works right now? I brought up this argument a year or so
ago on IRC, and there were people who defended ferociously the current
design using idealisms ("it should work like what it sounds like, it
should contain any type" or something like that), but I've yet to see
a practical argument.

I think that void[] should be left as is but I'm almost ready to throw in
with the idea that we **need** another type that has the no-cast parts of
void[] but assume no pointers as well.

Denis Koroskin

2009-05-31 20:55:58 UTC

Post by BCS
Hello Vladimir,

I think the idea is that void[] is the most general data type; it can be
anything, including pointers.
Also for a real world use case where void[]=mightHavePointers is valid,
consider a system that reads blocks of data structures from a file and
then does in place substation from file references to memory references.
You can't allocate buffers of the correct type because you may not even
know what that is until you have already loaded the data.

In this case you should *explicitly* mark that void[] array as "mightHavePointers".

MLT

2009-06-04 01:14:49 UTC

Post by Vladimir Panteleev
I don't know why it was decided to mark the contents of void[] as

[...]

Post by Vladimir Panteleev
3) It's very rare in practice that the only pointer to your
object (which you still plan to access later) to be stored in a
void[]-allocated array!

As quite a newby, I can sum up what I understood as follows:

1. The idea of void[] is that you can put anything in it without casting.
2. Because of this, you might put pointers in a void[].
3. Since you have "legitimately" stored pointers, and we don't want to have the GC throw away something that we still have valid pointers for, we have to have the GC scan over void[] arrays for possible hits.

4. This pretty much means that any "big"(*) D program can not afford to put uniformly distributed data in a void[] array, because the GC will stop working correctly - it will not dispose of stuff that you don't need any more.
(*) where "big" means a program that creates and destroys a lot of objects.

So, currently if you want to use void[] to store non-pointers, you need to use the gc function to mark the memory as not containing pointers.

A comment and a question. I agree that suddenly losing data because you stored a pointer in a void[] is worse than GC not working well. However, since GC in D is so automatic, almost any use of void[] to store non-pointer data will cause massive memory leaks and eventual program failure.

I can see 4 solutions...

First, to not allow non-pointers to be stored in void[]. So non-pointers are stored in ubyte[], pointers in void[]. Kinda looses the main point of using void[].

Second, void[] is not scanned by GC, but you can mark it to be. This can cause bugs if you store a pointer in void[], and later retreive it, but don't mark correctly.

Third, void[] is scanned by GC, but you can mark it not to be. This can cause memory leaks if you store complex data in void[] in a big program, and don't handle GC marking correctly.

Forth - somewhat more complex. Since the compiler knows exactly when a pointer is stored in a void[] and when not, it would be possible to have the compiler handle all by itself, as long as the property of having to be scanned by GC is dirty - once a variable has it, any other that touches that variable gets the property.

Of these four solutions, the last 3 can still cause bugs if one stores both pointers and data in the same void[] array, no matter how the memory is marked, unless one does that marking on a very fine scale (is that possible?)

My conclusion from all this is either "don't use void[]", or "only use void[] to store pointers" if you don't want bugs in a valid program.

Christopher Wright

2009-06-04 02:32:58 UTC

Post by MLT