Discussion:
GC.sizeOf(array.ptr)
Dicebot via Digitalmars-d
2014-09-30 13:42:12 UTC
Permalink
There is one issue I have encountered during CDGC porting I may
need advice with.
Consider this simple snippet:

```
import core.memory;

void main()
{
ubyte[] result;
result.length = 4096;
assert(GC.sizeOf(result.ptr) > 0);
}
``

Assertion passes with D1/Tango runtime but fails with current D2
runtime. This happens because `result.ptr` is not actually a
pointer returned by gc_qalloc from array reallocation, but
interior pointer 16 bytes from the start of that block. Druntime
stores some metadata (length/capacity I presume) in the very
beginning.

As a result GC.sizeOf(array.ptr) results in 0 (being an interior
pointer).

Interesting side effect is that this code:

```
void main()
{
ubyte[] result;
result.length = 4096;
GC.free(result.ptr);
}
```

..does not actually free anything for the very same reason
(result.ptr is interior pointer), it just silently succeeds doing
nothing.

Is such behaviour intended?
Steven Schveighoffer via Digitalmars-d
2014-09-30 14:01:17 UTC
Permalink
There is one issue I have encountered during CDGC porting I may need
advice with.
```
import core.memory;
void main()
{
ubyte[] result;
result.length = 4096;
assert(GC.sizeOf(result.ptr) > 0);
}
``
Assertion passes with D1/Tango runtime but fails with current D2
runtime. This happens because `result.ptr` is not actually a pointer
returned by gc_qalloc from array reallocation, but interior pointer 16
bytes from the start of that block. Druntime stores some metadata
(length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's only
the case for arrays, not general GC.malloc blocks.

Alternative is to use result.capacity, which essentially looks up the
same thing (and should be more accurate). But it doesn't cover the same
inputs.

Trivial note that you only need to allocate 2047 bytes to get this behavior.
As a result GC.sizeOf(array.ptr) results in 0 (being an interior pointer).
```
void main()
{
ubyte[] result;
result.length = 4096;
GC.free(result.ptr);
}
```
..does not actually free anything for the very same reason (result.ptr
is interior pointer), it just silently succeeds doing nothing.
This is a problem. We should provide a definitive way to dealloc an
array, if it's not already present (assuming delete goes away).
Is such behaviour intended?
Depends on what part you are talking about ;)

But kidding aside, when I added the array appending update, I didn't
intend to affect these functions, nor did I consider the cases above. So
it was not intended to break these things, but I think we can repair
them, or at least provide alternate means to create the same result.

Interestingly, I think any lookup of block information for an interior
pointer costs the same as looking up block info for a base pointer. I
think adding a flag that allows performing operations on interior
pointers might make this more palatable.

But we absolutely need a way to free an array that does not require the
knowledge of the inner workings of the runtime.

-Steve
Dicebot via Digitalmars-d
2014-09-30 14:24:27 UTC
Permalink
On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Post by Dicebot via Digitalmars-d
Assertion passes with D1/Tango runtime but fails with current
D2
runtime. This happens because `result.ptr` is not actually a
pointer
returned by gc_qalloc from array reallocation, but interior
pointer 16
bytes from the start of that block. Druntime stores some
metadata
(length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But
it's only the case for arrays, not general GC.malloc blocks.
Alternative is to use result.capacity, which essentially looks
up the same thing (and should be more accurate). But it doesn't
cover the same inputs.
Why is it stored in the beginning and not in the end of the block
(like capacity)? I'd like to explore options of removing interior
pointer completely before proceeding with adding more special
cases to GC functions.
Steven Schveighoffer via Digitalmars-d
2014-09-30 15:46:53 UTC
Permalink
Post by Steven Schveighoffer via Digitalmars-d
Post by Dicebot via Digitalmars-d
Assertion passes with D1/Tango runtime but fails with current D2
runtime. This happens because `result.ptr` is not actually a pointer
returned by gc_qalloc from array reallocation, but interior pointer 16
bytes from the start of that block. Druntime stores some metadata
(length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's
only the case for arrays, not general GC.malloc blocks.
Alternative is to use result.capacity, which essentially looks up the
same thing (and should be more accurate). But it doesn't cover the
same inputs.
Why is it stored in the beginning and not in the end of the block (like
capacity)? I'd like to explore options of removing interior pointer
completely before proceeding with adding more special cases to GC
functions.
First, it is the capacity. It's just that the capacity lives at the
beginning of larger blocks.

The reason is due to the ability to extend pages.

With smaller blocks (2048 bytes or less), the page is divided into equal
portions, and those can NEVER be extended. Any attempt to extend results
in a realloc into another block. Putting the capacity at the end makes
sense for 2 reasons: 1. 1 byte is already reserved to prevent
cross-block pointers, 2. It doesn't cause alignment issues. We can't
very well offset a 16 byte block by 16 bytes. But importantly, the
capacity field does not move.

However, for page and above size (4096+ bytes), the original (D1 and
early D2) runtime would attempt to extend into the next page, without
moving the data. Thus we save the copy of data into a new block, and
just set some bits and we're done.

But this poses a problem for when the capacity field is stored at the
end -- especially since we are caching the block info. The block info
can change with a call to GC.extend (whereas a fixed-size block, the
block info CANNOT change). Depending on what "version" of the block info
you have, the "end" can be different, and you may end up corrupting
data. This is especially important for shared or immutable array blocks,
where multiple threads could be appending at the same time.

So I made the call to put it at the beginning of the block, which
obviously doesn't change, and offset everything by 16 bytes to maintain
alignment.

It may very well be that we can put it at the end of the block instead,
and you can probably do so without much effort in the runtime
(everything uses CTFE functions to calculate padding and location of the
capacity). It has been such a long time since I did that, I'm not very
sure of all the reasons not to do it. A look through the mailing list
archives might be useful.

-Steve
Dicebot via Digitalmars-d
2014-09-30 16:01:54 UTC
Permalink
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Post by Dicebot via Digitalmars-d
On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Post by Dicebot via Digitalmars-d
Assertion passes with D1/Tango runtime but fails with
current D2
runtime. This happens because `result.ptr` is not actually a pointer
returned by gc_qalloc from array reallocation, but interior
pointer 16
bytes from the start of that block. Druntime stores some
metadata
(length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's
only the case for arrays, not general GC.malloc blocks.
Alternative is to use result.capacity, which essentially
looks up the
same thing (and should be more accurate). But it doesn't
cover the
same inputs.
Why is it stored in the beginning and not in the end of the
block (like
capacity)? I'd like to explore options of removing interior
pointer
completely before proceeding with adding more special cases to GC
functions.
First, it is the capacity. It's just that the capacity lives at
the beginning of larger blocks.
The reason is due to the ability to extend pages.
With smaller blocks (2048 bytes or less), the page is divided
into equal portions, and those can NEVER be extended. Any
attempt to extend results in a realloc into another block.
Putting the capacity at the end makes sense for 2 reasons: 1. 1
byte is already reserved to prevent cross-block pointers, 2. It
doesn't cause alignment issues. We can't very well offset a 16
byte block by 16 bytes. But importantly, the capacity field
does not move.
However, for page and above size (4096+ bytes), the original
(D1 and early D2) runtime would attempt to extend into the next
page, without moving the data. Thus we save the copy of data
into a new block, and just set some bits and we're done.
Ah that must be what confused me - I looked at small block offset
calculation originally and blindly assumed same logic for other
sizes. Sorry, my fault!
Post by Steven Schveighoffer via Digitalmars-d
But this poses a problem for when the capacity field is stored
at the end -- especially since we are caching the block info.
The block info can change with a call to GC.extend (whereas a
fixed-size block, the block info CANNOT change). Depending on
what "version" of the block info you have, the "end" can be
different, and you may end up corrupting data. This is
especially important for shared or immutable array blocks,
where multiple threads could be appending at the same time.
So I made the call to put it at the beginning of the block,
which obviously doesn't change, and offset everything by 16
bytes to maintain alignment.
It may very well be that we can put it at the end of the block
instead, and you can probably do so without much effort in the
runtime (everything uses CTFE functions to calculate padding
and location of the capacity). It has been such a long time
since I did that, I'm not very sure of all the reasons not to
do it. A look through the mailing list archives might be useful.
I think it should be possible. That way actual block size will be
simply considered a bit smaller and extending happen before
reserved space is hit. But of course I have only a very vague
knowledge of druntime ackquired while porting cdgc so may need to
think about it a bit more and probably chat with Leandro too :)

Have created bugzilla issue for now :
https://issues.dlang.org/show_bug.cgi?id=13558
Steven Schveighoffer via Digitalmars-d
2014-09-30 17:19:54 UTC
Permalink
I think it should be possible. That way actual block size will be simply
considered a bit smaller and extending happen before reserved space is
hit. But of course I have only a very vague knowledge of druntime
ackquired while porting cdgc so may need to think about it a bit more
and probably chat with Leandro too :)
I think it is possible, and perhaps more correct, to put the array
append size into a separate memory block for larger sizes. The placement
at the end works very well and has good properties for smaller sizes.

This is similar to how the flags work.

-Steve
Sean Kelly via Digitalmars-d
2014-09-30 17:28:02 UTC
Permalink
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
So I made the call to put it at the beginning of the block,
which obviously doesn't change, and offset everything by 16
bytes to maintain alignment.
It may very well be that we can put it at the end of the block
instead, and you can probably do so without much effort in the
runtime (everything uses CTFE functions to calculate padding
and location of the capacity). It has been such a long time
since I did that, I'm not very sure of all the reasons not to
do it. A look through the mailing list archives might be useful.
Yes, a lot of this is an artifact of the relatively simplistic
manner that the current GC tracks memory. If large blocks had a
header, for example, then this could theoretically live there and
not cause any problems. As we move towards supporting precise
scanning, the GC will need to be aware of the types of data it
holds, and so some portion of the array appendability strategy
should probably migrate into the GC. A redefinition of the GC
interface is many years overdue. This just needs to be
considered when it happens.
Dicebot via Digitalmars-d
2014-09-30 17:43:04 UTC
Permalink
Post by Dicebot via Digitalmars-d
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
So I made the call to put it at the beginning of the block,
which obviously doesn't change, and offset everything by 16
bytes to maintain alignment.
It may very well be that we can put it at the end of the block
instead, and you can probably do so without much effort in the
runtime (everything uses CTFE functions to calculate padding
and location of the capacity). It has been such a long time
since I did that, I'm not very sure of all the reasons not to
do it. A look through the mailing list archives might be
useful.
Yes, a lot of this is an artifact of the relatively simplistic
manner that the current GC tracks memory. If large blocks had a
header, for example, then this could theoretically live there
and
not cause any problems. As we move towards supporting precise
scanning, the GC will need to be aware of the types of data it
holds, and so some portion of the array appendability strategy
should probably migrate into the GC. A redefinition of the GC
interface is many years overdue. This just needs to be
considered when it happens.
I decided to add a similar workaround to CDGC for now and fix the
way it is stored in druntime a bit later :)

On a related note : CDGC D2 port passes druntime test suite
starting with today (with only shared library tests disabled), so
initial upstream PR should happen very soon (just need to clean
up all trace and "omg hax" commits :))
ketmar via Digitalmars-d
2014-10-01 03:06:47 UTC
Permalink
On Tue, 30 Sep 2014 17:43:04 +0000
Post by Dicebot via Digitalmars-d
On a related note : CDGC D2 port passes druntime test suite
starting with today (with only shared library tests disabled), so
initial upstream PR should happen very soon (just need to clean
up all trace and "omg hax" commits :))
wow! no, really, what else can i say? ;-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20141001/805119fd/attachment.sig>
Sean Kelly via Digitalmars-d
2014-09-30 17:23:21 UTC
Permalink
Post by Dicebot via Digitalmars-d
Is such behaviour intended?
Yes. As far as the GC is concerned, asking for the size of an
interior pointer is asking for the size of a slice of the data,
and as slices are not extendable it will always return 0. All of
the APPENDABLE stuff takes place outside the GC (except for the
definition of the APPENDABLE BlkAttr, which really should be
defined externally and within the user-reserved range of the
bitfield instead of the GC-reserved range, but I digress...) and
so it has no way of knowing that someone is using the blocks this
way.
Steven Schveighoffer via Digitalmars-d
2014-09-30 17:51:18 UTC
Permalink
Post by Sean Kelly via Digitalmars-d
(except for the
definition of the APPENDABLE BlkAttr, which really should be
defined externally and within the user-reserved range of the
bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to reserve
that memory for all allocations. Basically, if you try to append to a
block that doesn't have the bit, it simply reallocates conservatively.

So it does have to be part of the GC metadata, because the most
important effect is on blocks that AREN'T allocated via the array
runtime. Otherwise, the append mechanism creeps into all aspects of
memory allocation.

-Steve
Sean Kelly via Digitalmars-d
2014-09-30 18:19:45 UTC
Permalink
On Tuesday, 30 September 2014 at 17:51:18 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Post by Sean Kelly via Digitalmars-d
(except for the
definition of the APPENDABLE BlkAttr, which really should be
defined externally and within the user-reserved range of the
bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to
reserve that memory for all allocations. Basically, if you try
to append to a block that doesn't have the bit, it simply
reallocates conservatively.
So it does have to be part of the GC metadata, because the most
important effect is on blocks that AREN'T allocated via the
array runtime. Otherwise, the append mechanism creeps into all
aspects of memory allocation.
Yeah I know. But when I defined the BlkAttr bitfield I'd
reserved one portion of the range for internal GC stuff and
another portion for user-defined stuff. APPENDABLE should
probably have landed in the user-defined portion. I don't see
any of those comments in the current code or I'd point to them.
I guess they were deleted at some point.
Steven Schveighoffer via Digitalmars-d
2014-09-30 19:19:17 UTC
Permalink
Post by Sean Kelly via Digitalmars-d
On Tuesday, 30 September 2014 at 17:51:18 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Post by Sean Kelly via Digitalmars-d
(except for the
definition of the APPENDABLE BlkAttr, which really should be
defined externally and within the user-reserved range of the
bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to reserve
that memory for all allocations. Basically, if you try to append to a
block that doesn't have the bit, it simply reallocates conservatively.
So it does have to be part of the GC metadata, because the most
important effect is on blocks that AREN'T allocated via the array
runtime. Otherwise, the append mechanism creeps into all aspects of
memory allocation.
Yeah I know. But when I defined the BlkAttr bitfield I'd
reserved one portion of the range for internal GC stuff and
another portion for user-defined stuff. APPENDABLE should
probably have landed in the user-defined portion. I don't see
any of those comments in the current code or I'd point to them.
I guess they were deleted at some point.
Hm... looked at the code, I have no idea how the GC would handle
user-defined stuff. It seems to only deal with bits it knows about (i.e.
APPENDABLE is specifically handled in gc/gc.d)

I think I just took the next bit available, if there was a warning about
GC internal bits when I added APPENDABLE, I either missed it or
dismissed it.

-Steve
Sean Kelly via Digitalmars-d
2014-09-30 19:27:05 UTC
Permalink
On Tuesday, 30 September 2014 at 19:19:18 UTC, Steven
Post by Steven Schveighoffer via Digitalmars-d
Hm... looked at the code, I have no idea how the GC would
handle user-defined stuff. It seems to only deal with bits it
knows about (i.e. APPENDABLE is specifically handled in gc/gc.d)
It wouldn't. The GC was just going to provide storage for some
number of user-defined bits.
Post by Steven Schveighoffer via Digitalmars-d
I think I just took the next bit available, if there was a
warning about GC internal bits when I added APPENDABLE, I
either missed it or dismissed it.
With BlkAttr defined independently in a bunch of different places
at the time, the comment would have been easy to miss. I don't
know that it really matters anyway, but it's something I noticed
today.
Loading...