Discussion:
Possible quick win in GC?
Abdulhaq via Digitalmars-d
2014-09-28 16:29:44 UTC
Permalink
Perhaps I've too had much caffeine today but I've had an idea
which might give a fairly quick win on the GC speed situation.
It's a simple idea at heart so it's very possible/likely that
this is a well known idea that has already been discarded but
anyway here goes.

I got the idea after thinking that it should be fairly simple for
the compiler to detect straightforward cases of when a variable
can be declared as going on the stack - i.e. no references to it
are retained after its enclosing function returns. At the moment
AIUI it is necessary for a class instance to be declared by the
programmer as 'scoped' for this to take place.

Further, I was considering the type of ownership and boundary
considerations that could be used to improve memory management -
e.g. using the notion of an owner instance which, upon
destruction, destroys all owned objects.

Afer some consideration it seems to me that by using only static
analysis a tree of references could be constructed of references
from a root 'scoped' object to all referred to objects that are
allocated after the allocation of the root object. When the root
object goes out of scope it is destroyed and all the descendent
objects from the root object (as identified by the static
analysis) could also be destroyed in one simple shot. The static
analysis of course constructs the tree by analysing the capturing
of references from one object to another. It could be the case
that even a simple static analysis at first (e.g. discard the
technique in difficult situations) could cover a lot of use cases
(statistically).

Of course, if one of the descendent objects is referred to by an
object which is not in the object tree, then this technique
cannot be used. However, I envisage that there are many
situations where upon the destruction of a root object all
related post-allocated objects can also be destroyed.

In terms of implementation I see this being done by what I am
calling 'bands' within the GC. With the allocation of any
identified root object, a new band (heap) is created in the GC.
Child objects of the root object (i.e. only referred to by the
root object and other child objects in its tree) are placed in
the same band. When the root object goes out of scope the entire
band is freed. This by definition is safe because the static
analysis has ensured that there are no 'out-of-tree' references
between child objects in the tree and out-of-tree (out-of-band)
objects. This property also means that normal GC runs do not need
to add the scoped root object as a GC root object - this memory
will normally only be freed when the scoped root object at the
top of the tree goes out of scope. If memory becomes constrained
then the bands can be added as root objects to the GC and memory
incrementally freed just as with regularly allocated objects.


Sorry if this idea is daft and I've wasted your time!
Abdulhaq via Digitalmars-d
2014-09-28 17:47:40 UTC
Permalink
Here's a code snippet which mopefully makes things a bit clearer:


/**
* In this example the variable foo can be statically analysed as
safe to go on the stack.
* The new instance of Bar allocated in funcLevelB is only
referred to by foo. foo can
* be considered a root 'scoped' variable and the GC can delete
both foo and the new Bar()
* when foo goes out of scope. There is no need (except when under
memory pressure) for
* the GC to scan the band created for foo and it's related child
allocations.
*/

import std.stdio;

class Bar {
public:
int x;

this(int x) {
this.x = x;
}
}

class Foo {
public:
Bar bar;
}

void funcLevelA() {
Foo foo = new Foo(); // static analysis could detect this as
able to go on the stack
funcLevelB(foo);
writeln(foo.bar.x);
}

void funcLevelB(Foo foo) {
foo.bar = new Bar(12); // this allocated memory is only referred
to by foo, which
// static analysis has established can go on the stack
}

void main() {
funcLevelA();
}
Freddy via Digitalmars-d
2014-09-28 20:09:34 UTC
Permalink
Post by Abdulhaq via Digitalmars-d
Here's a code snippet which mopefully makes things a bit
/**
* In this example the variable foo can be statically analysed
as safe to go on the stack.
* The new instance of Bar allocated in funcLevelB is only
referred to by foo. foo can
* be considered a root 'scoped' variable and the GC can delete
both foo and the new Bar()
* when foo goes out of scope. There is no need (except when
under memory pressure) for
* the GC to scan the band created for foo and it's related
child allocations.
*/
import std.stdio;
class Bar {
int x;
this(int x) {
this.x = x;
}
}
class Foo {
Bar bar;
}
void funcLevelA() {
Foo foo = new Foo(); // static analysis could detect this as
able to go on the stack
funcLevelB(foo);
writeln(foo.bar.x);
}
void funcLevelB(Foo foo) {
foo.bar = new Bar(12); // this allocated memory is only
referred to by foo, which
// static analysis has established can go on the stack
}
void main() {
funcLevelA();
}
You mean this?
https://en.wikipedia.org/wiki/Escape_analysis
Abdulhaq via Digitalmars-d
2014-09-29 06:48:06 UTC
Permalink
Post by Freddy via Digitalmars-d
You mean this?
https://en.wikipedia.org/wiki/Escape_analysis
Of course my proposal uses the techique of escape analysis as
part of its methodology, but the essence of the idea is to
greatly cut down on the work that the GC has to do on each sweep
when dealing with objects that have been found to belong to a
particular set. The objects in each set are in an object graph
that has no incoming references from objects external to the set
and which can therefore be allocated in their own heap that is
destroyed when the root object goes out of scope.

The saving takes place because the GC does not need to scan the
default heap for pointers found in the new heaps (bands). For
certain type of programs such as compilers / lexers / parsers
where many temporary objects are allocated and shortly after
deallocated this can result in a substantial time saving in
execution. In terms of memory usage we would see multiple
potentially large but short-lived spikes.
David Nadlinger via Digitalmars-d
2014-09-28 20:20:28 UTC
Permalink
Post by Abdulhaq via Digitalmars-d
I got the idea after thinking that it should be fairly simple
for the compiler to detect straightforward cases of when a
variable can be declared as going on the stack - i.e. no
references to it are retained after its enclosing function
returns.
LDC does the "fairly simple" part of this already in a custom
LLVM optimizer pass. The issue is that escape analysis is fairly
hard in general, and currently even more limited because we only
do it on the LLVM IR level (i.e. don't leverage any additional
attributes like scope, pure, 
 that might be present in the D
source code).

David
Abdulhaq via Digitalmars-d
2014-09-29 07:03:28 UTC
Permalink
On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger
Post by David Nadlinger via Digitalmars-d
Post by Abdulhaq via Digitalmars-d
I got the idea after thinking that it should be fairly simple
for the compiler to detect straightforward cases of when a
variable can be declared as going on the stack - i.e. no
references to it are retained after its enclosing function
returns.
LDC does the "fairly simple" part of this already in a custom
LLVM optimizer pass. The issue is that escape analysis is
fairly hard in general, and currently even more limited because
we only do it on the LLVM IR level (i.e. don't leverage any
additional attributes like scope, pure, 
 that might be present
in the D source code).
David
That's interesting, yes I guessed that the escape analysis would
present the harder part, but I'm hoping that the algorithm can be
built up incrementally, identifying the easy wins first and then
over time extending it to cover harder cases.

One way that I see it working it is to conduct a form of lowering
where the new operator has some information added to it to
indicate the 'band' that the GC should place the non-root objects
into (root objects go on the stack). Using the syntax of C++'s
placement new (but totally different semantics) code could be
lowered to e.g.

External externalObj = new(0) External(); // 0 means use the
default heap
Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for
this set of objects
...
Bar bar = new (0x1234) Bar();

When the GC allocates memory it does so in the indicated
band/heap, and then when foo (the root object of the object
graph) goes out of scope the relevant band/heap is destroyed en
bloc. The benefit of the idea is that when scanning for objects
that can be deleted the GC does not need to consider those
objects in the non default bands/heaps. For some classes of
programs such as compilers (it was Higgs that gave me the
stimulus), and with good static analysis (aye there's the rub
cap'n) this could represent a very substantial time saving on ech
GC sweep.
Mike via Digitalmars-d
2014-09-29 09:45:35 UTC
Permalink
Post by Abdulhaq via Digitalmars-d
On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger
Post by David Nadlinger via Digitalmars-d
Post by Abdulhaq via Digitalmars-d
I got the idea after thinking that it should be fairly simple
for the compiler to detect straightforward cases of when a
variable can be declared as going on the stack - i.e. no
references to it are retained after its enclosing function
returns.
LDC does the "fairly simple" part of this already in a custom
LLVM optimizer pass. The issue is that escape analysis is
fairly hard in general, and currently even more limited
because we only do it on the LLVM IR level (i.e. don't
leverage any additional attributes like scope, pure, 
 that
might be present in the D source code).
David
That's interesting, yes I guessed that the escape analysis
would present the harder part, but I'm hoping that the
algorithm can be built up incrementally, identifying the easy
wins first and then over time extending it to cover harder
cases.
One way that I see it working it is to conduct a form of
lowering where the new operator has some information added to
it to indicate the 'band' that the GC should place the non-root
objects into (root objects go on the stack). Using the syntax
of C++'s placement new (but totally different semantics) code
could be lowered to e.g.
External externalObj = new(0) External(); // 0 means use the
default heap
Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for
this set of objects
...
Bar bar = new (0x1234) Bar();
When the GC allocates memory it does so in the indicated
band/heap, and then when foo (the root object of the object
graph) goes out of scope the relevant band/heap is destroyed en
bloc. The benefit of the idea is that when scanning for objects
that can be deleted the GC does not need to consider those
objects in the non default bands/heaps. For some classes of
programs such as compilers (it was Higgs that gave me the
stimulus), and with good static analysis (aye there's the rub
cap'n) this could represent a very substantial time saving on
ech GC sweep.
Sounds a little like http://wiki.dlang.org/DIP46

Mike
Abdulhaq via Digitalmars-d
2014-09-29 11:44:18 UTC
Permalink
Post by Mike via Digitalmars-d
Post by Abdulhaq via Digitalmars-d
On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger
Post by David Nadlinger via Digitalmars-d
Post by Abdulhaq via Digitalmars-d
I got the idea after thinking that it should be fairly
simple for the compiler to detect straightforward cases of
when a variable can be declared as going on the stack - i.e.
no references to it are retained after its enclosing
function returns.
LDC does the "fairly simple" part of this already in a custom
LLVM optimizer pass. The issue is that escape analysis is
fairly hard in general, and currently even more limited
because we only do it on the LLVM IR level (i.e. don't
leverage any additional attributes like scope, pure, 
 that
might be present in the D source code).
David
That's interesting, yes I guessed that the escape analysis
would present the harder part, but I'm hoping that the
algorithm can be built up incrementally, identifying the easy
wins first and then over time extending it to cover harder
cases.
One way that I see it working it is to conduct a form of
lowering where the new operator has some information added to
it to indicate the 'band' that the GC should place the
non-root objects into (root objects go on the stack). Using
the syntax of C++'s placement new (but totally different
semantics) code could be lowered to e.g.
External externalObj = new(0) External(); // 0 means use the
default heap
Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for
this set of objects
...
Bar bar = new (0x1234) Bar();
When the GC allocates memory it does so in the indicated
band/heap, and then when foo (the root object of the object
graph) goes out of scope the relevant band/heap is destroyed
en bloc. The benefit of the idea is that when scanning for
objects
that can be deleted the GC does not need to consider those
objects in the non default bands/heaps. For some classes of
programs such as compilers (it was Higgs that gave me the
stimulus), and with good static analysis (aye there's the rub
cap'n) this could represent a very substantial time saving on
ech GC sweep.
Sounds a little like http://wiki.dlang.org/DIP46
Mike
Ah thanks for the link yes there are definite similarities,
Walter identifies sets of objects to go in his region through
having a boundary on the pure function. My set is determined by
static analysis and does not require the function to be pure.
However, thedeemon has pointed out that this technique is in fact
well known - I'm left to wonder if I should have a go at an
implementation, but my wife would not be too chuffed about that.
thedeemon via Digitalmars-d
2014-09-29 11:02:10 UTC
Permalink
On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote...

Congratulations on inventing
http://en.wikipedia.org/wiki/Region-based_memory_management
and "Region inference" in particular.
Abdulhaq via Digitalmars-d
2014-09-29 11:36:50 UTC
Permalink
Post by thedeemon via Digitalmars-d
On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote...
Congratulations on inventing
http://en.wikipedia.org/wiki/Region-based_memory_management
and "Region inference" in particular.
Ah yes - it's identical isn't it - I don't know whether to be
happy or sad about that :). They explain it much better too...

So, (does anyone know) has this technique been discarded for D or
is it 'just' a matter of the resources to do it?
thedeemon via Digitalmars-d
2014-09-29 12:02:38 UTC
Permalink
Post by Abdulhaq via Digitalmars-d
So, (does anyone know) has this technique been discarded for D
or is it 'just' a matter of the resources to do it?
From the literature on this topic I remember attempts for
automatic region inference mostly failed: it lead to some small
regions here and there that didn't affect anything much and then
one huge region where most of the data landed, requiring a
full-blown GC inside that big region. Making it work in D, where
everything can be mutated by anything and any type system wall is
a cast() away from breaking, seems virtually impossible.
Abdulhaq via Digitalmars-d
2014-09-30 08:47:56 UTC
Permalink
Post by thedeemon via Digitalmars-d
Post by Abdulhaq via Digitalmars-d
So, (does anyone know) has this technique been discarded for D
or is it 'just' a matter of the resources to do it?
From the literature on this topic I remember attempts for
automatic region inference mostly failed: it lead to some small
regions here and there that didn't affect anything much and
then one huge region where most of the data landed, requiring a
full-blown GC inside that big region. Making it work in D,
where everything can be mutated by anything and any type system
wall is a cast() away from breaking, seems virtually impossible.
Mmm yes I can imagine that being the case... oh well!

Continue reading on narkive:
Loading...