Discussion:
Container insertion and removal
(too old to reply)
Andrei Alexandrescu
2010-03-06 12:49:18 UTC
Permalink
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual organization
that is weaker than that of iterators.

A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
offense, and I plan to define the following naming convention:

Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.

In addition to those, a container may also define functions named after
the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
that are guaranteed to not affect the ranges currently iterating the
container.

Generic code that needs specific iterator (non-)invalidation rules can
use softXxx methods, in confidence that containers not supporting it
will be ruled out during compilations.

Sounds good?


Andrei
Steven Schveighoffer
2010-03-06 13:19:36 UTC
Permalink
Post by Andrei Alexandrescu
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual organization
that is weaker than that of iterators.
A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.
In addition to those, a container may also define functions named after
the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
that are guaranteed to not affect the ranges currently iterating the
container.
Generic code that needs specific iterator (non-)invalidation rules can
use softXxx methods, in confidence that containers not supporting it
will be ruled out during compilations.
Sounds good?
How can softRemove not affect iterating ranges? What if the range is positioned on the element removed?

The only two containers that would support softInsert would be linked list and sorted map/set. Anything else might completely screw up the iteration. I don't see a lot of "generic" use for it.

Another option is to use a "mutation" field that is checked every chance by the range. If it changes, then the range is invalidated.

-Steve
Robert Jacques
2010-03-06 16:19:15 UTC
Permalink
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual organization
that is weaker than that of iterators.
A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.
In addition to those, a container may also define functions named after
the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
that are guaranteed to not affect the ranges currently iterating the
container.
Generic code that needs specific iterator (non-)invalidation rules can
use softXxx methods, in confidence that containers not supporting it
will be ruled out during compilations.
Sounds good?
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since the
memory is still there. And so long as you're not trying to use free lists,
linked-lists, trees, etc. can be written so that the ranges never enter an
invalid state. If they happen to be pointing at the removed node, they
just end up stopping early.
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
There's all the containers based upon linked-lists, etc like hashes,
stacks, queues and dequeues.
Post by Steven Schveighoffer
Another option is to use a "mutation" field that is checked every chance
by the range. If it changes, then the range is invalidated.
The mutation field would have to be a version number to support multiple
ranges, and given experience with lock-free algorithms which use a 'tag'
in a similar manner, this concept is bug prone and should not be relied
upon. It would be better to 'lock' the node or container to topology
changes, though this does slow things down and has no conflict resolution:
removing a locked node would have to throw an exception.
Steven Schveighoffer
2010-03-07 02:54:50 UTC
Permalink
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since the
memory is still there. And so long as you're not trying to use free
lists, linked-lists, trees, etc. can be written so that the ranges never
enter an invalid state. If they happen to be pointing at the removed
node, they just end up stopping early.
If my linked list range has two node pointers as the implementation, and
you remove the end node, it's going to end later, not early.
Of course, you can get around this by just storing a current node and a
length, but that may not be what the user is expecting. For example, if
you did a range of a sorted tree from "a" to "f" and it stops at "n", I
think that would be extremely unexpected.

Stopping early is invalidation also IMO. If your program logic depends on
iterating over all the elements you originally intended to iterate, then
we have big problems if they stop early.
Post by Robert Jacques
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
There's all the containers based upon linked-lists, etc like hashes,
stacks, queues and dequeues.
Hashes may be rehashed when inserting, completely invalidating a range
(possibly the end point will be before the starting point).

Yes, the others you mentioned will be valid. But I still don't see it
being more useful than just using documentation to indicate insertion will
not invalidate ranges. Perhaps I am wrong.
Post by Robert Jacques
Post by Steven Schveighoffer
Another option is to use a "mutation" field that is checked every
chance by the range. If it changes, then the range is invalidated.
The mutation field would have to be a version number to support multiple
ranges, and given experience with lock-free algorithms which use a 'tag'
in a similar manner, this concept is bug prone and should not be relied
upon. It would be better to 'lock' the node or container to topology
changes, though this does slow things down and has no conflict
resolution: removing a locked node would have to throw an exception.
I was not thinking of multithreaded applications. I don't think it's
worth making containers by default be multithreaded safe.

The mutation index has been used in Tango forever, and I think was in Doug
Lea's original container implementations. I'm pretty sure it is sound in
single-threaded uses.

-Steve
Robert Jacques
2010-03-07 07:10:09 UTC
Permalink
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since
the memory is still there. And so long as you're not trying to use free
lists, linked-lists, trees, etc. can be written so that the ranges
never enter an invalid state. If they happen to be pointing at the
removed node, they just end up stopping early.
If my linked list range has two node pointers as the implementation, and
you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
Post by Steven Schveighoffer
Of course, you can get around this by just storing a current node and a
length, but that may not be what the user is expecting. For example, if
you did a range of a sorted tree from "a" to "f" and it stops at "n", I
think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Post by Steven Schveighoffer
Stopping early is invalidation also IMO. If your program logic depends
on iterating over all the elements you originally intended to iterate,
then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
There's all the containers based upon linked-lists, etc like hashes,
stacks, queues and dequeues.
Hashes may be rehashed when inserting, completely invalidating a range
(possibly the end point will be before the starting point).
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
Post by Steven Schveighoffer
Yes, the others you mentioned will be valid. But I still don't see it
being more useful than just using documentation to indicate insertion
will not invalidate ranges. Perhaps I am wrong.
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Another option is to use a "mutation" field that is checked every
chance by the range. If it changes, then the range is invalidated.
The mutation field would have to be a version number to support
multiple ranges, and given experience with lock-free algorithms which
use a 'tag' in a similar manner, this concept is bug prone and should
not be relied upon. It would be better to 'lock' the node or container
to topology changes, though this does slow things down and has no
conflict resolution: removing a locked node would have to throw an
exception.
I was not thinking of multithreaded applications. I don't think it's
worth making containers by default be multithreaded safe.
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
Post by Steven Schveighoffer
The mutation index has been used in Tango forever, and I think was in
Doug Lea's original container implementations. I'm pretty sure it is
sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Steven Schveighoffer
2010-03-07 13:23:03 UTC
Permalink
Post by Robert Jacques
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since
the memory is still there. And so long as you're not trying to use free
lists, linked-lists, trees, etc. can be written so that the ranges
never enter an invalid state. If they happen to be pointing at the
removed node, they just end up stopping early.
If my linked list range has two node pointers as the implementation, and
you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending, otherwise it's an iterator. Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something. Or are you asserting the only useful range for a linked list iterates the entire list?

Think of it as the equivalent of a slice of an array.
Post by Robert Jacques
Post by Steven Schveighoffer
Of course, you can get around this by just storing a current node and a
length, but that may not be what the user is expecting. For example, if
you did a range of a sorted tree from "a" to "f" and it stops at "n", I
think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Allowed, yes. Not invalidating iterators, no.
Post by Robert Jacques
Post by Steven Schveighoffer
Stopping early is invalidation also IMO. If your program logic depends
on iterating over all the elements you originally intended to iterate,
then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
I still fail to see the difference between "soft" operations and non-soft. What does soft guarantee? Give me a concrete definition, an example would help too.
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
There's all the containers based upon linked-lists, etc like hashes,
stacks, queues and dequeues.
Hashes may be rehashed when inserting, completely invalidating a range
(possibly the end point will be before the starting point).
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
God no. If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes? I just rearrange them in a new bucket array.
Post by Robert Jacques
Post by Steven Schveighoffer
Yes, the others you mentioned will be valid. But I still don't see it
being more useful than just using documentation to indicate insertion
will not invalidate ranges. Perhaps I am wrong.
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
What is the advantage? Why would an algorithm require soft functions? What is an example of such an algorithm?
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Another option is to use a "mutation" field that is checked every
chance by the range. If it changes, then the range is invalidated.
The mutation field would have to be a version number to support
multiple ranges, and given experience with lock-free algorithms which
use a 'tag' in a similar manner, this concept is bug prone and should
not be relied upon. It would be better to 'lock' the node or container
to topology changes, though this does slow things down and has no
conflict resolution: removing a locked node would have to throw an
exception.
I was not thinking of multithreaded applications. I don't think it's
worth making containers by default be multithreaded safe.
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid? That's a valid, but very very weak point.
Post by Robert Jacques
Post by Steven Schveighoffer
The mutation index has been used in Tango forever, and I think was in
Doug Lea's original container implementations. I'm pretty sure it is
sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.

-Steve
Fawzi Mohamed
2010-03-07 16:18:36 UTC
Permalink
Post by Steven Schveighoffer
Post by Robert Jacques
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since
the memory is still there. And so long as you're not trying to use free
lists, linked-lists, trees, etc. can be written so that the ranges
never enter an invalid state. If they happen to be pointing at the
removed node, they just end up stopping early.
If my linked list range has two node pointers as the implementation, and
you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending,
otherwise it's an iterator. Whether you store it via a pointer to the
last (not-to-be-iterated) node, or via a length, or via a value to
compare with for stopping, you have to use something. Or are you
asserting the only useful range for a linked list iterates the entire
list?
Think of it as the equivalent of a slice of an array.
Post by Robert Jacques
Post by Steven Schveighoffer
Of course, you can get around this by just storing a current node and a
length, but that may not be what the user is expecting. For example, if
you did a range of a sorted tree from "a" to "f" and it stops at "n", I
think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Allowed, yes. Not invalidating iterators, no.
I agree with this point of view, it makes sense to make some operations
invalidating iteration, and even there there is a hierarcy of
"invalidation":

1) all iterator are still valid, and iterate on the same set that they
were iterating before. This is the best thing from a user, and is what
persistent structures do, all the structures described in the book I
gave have this property.
To achieve this you need to basically *always* create a copy when
modifying a structure, but fortunately often there are ways to share
most of the structure with the old value. Still this has a cost, and
normally puts pressure on the memory/gc (all the nodes, or groups of
them have to be single objects for the gc).

2) iterators are still valid, but they might iterate on a different set
of elements than originally planned.
Array append with atomic update of elements gives this for example,
removal in general has more problems, even if the memory might still be
valid. Invariants of the array might be violated (for example strict
ordering), one could separate this in two sub points, invariant
violating and non, but probably it is not worth the effort.

3) iterators are fully invalid, and at worst they might iterate on
completely wrong/invalid data without detection of failure, hopefully
at least in debug mode detection should be an option.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Stopping early is invalidation also IMO. If your program logic depends
on iterating over all the elements you originally intended to iterate,
then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
I still fail to see the difference between "soft" operations and
non-soft. What does soft guarantee? Give me a concrete definition, an
example would help too.
I suppose soft should guarantee either 1 or 2, from what andrei was
saying I suppose he wants 2

This information is very important for the user for example to iterate
+manipulate (addition/removal), one doesn't necessarily need to do a
copy with all containers.

In general one wants to have clean interface, + just one or two "good"
implementations, I don't feel that there is a strong need to have many
implementations of the basic containers in a standard library, (the
other would be just implemented for a specific application, if needed).
Then there might be different containers (db, distributed hashes,
xml,...), and it would be nice if those can be as close as possible in
some aspects of the api to the standard containers.
Again here generic algorithms are useful to augment the usefulness of
special containers with a minimal coding effort, but should not
"disallow" ad-hoc implementation in the container.
Andrei Alexandrescu
2010-03-08 18:01:32 UTC
Permalink
Post by Fawzi Mohamed
Post by Steven Schveighoffer
Post by Robert Jacques
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing an
element from an array doesn't invalidate the underlying range, since
the memory is still there. And so long as you're not trying to use free
lists, linked-lists, trees, etc. can be written so that the ranges
never enter an invalid state. If they happen to be pointing at the
removed node, they just end up stopping early.
If my linked list range has two node pointers as the implementation, and
you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending,
otherwise it's an iterator. Whether you store it via a pointer to the
last (not-to-be-iterated) node, or via a length, or via a value to
compare with for stopping, you have to use something. Or are you
asserting the only useful range for a linked list iterates the entire
list?
Think of it as the equivalent of a slice of an array.
Post by Robert Jacques
Post by Steven Schveighoffer
Of course, you can get around this by just storing a current node and a
length, but that may not be what the user is expecting. For example, if
you did a range of a sorted tree from "a" to "f" and it stops at "n", I
think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Allowed, yes. Not invalidating iterators, no.
I agree with this point of view, it makes sense to make some operations
invalidating iteration, and even there there is a hierarcy of
1) all iterator are still valid, and iterate on the same set that they
were iterating before. This is the best thing from a user, and is what
persistent structures do, all the structures described in the book I
gave have this property.
To achieve this you need to basically *always* create a copy when
modifying a structure, but fortunately often there are ways to share
most of the structure with the old value. Still this has a cost, and
normally puts pressure on the memory/gc (all the nodes, or groups of
them have to be single objects for the gc).
2) iterators are still valid, but they might iterate on a different set
of elements than originally planned.
Array append with atomic update of elements gives this for example,
removal in general has more problems, even if the memory might still be
valid. Invariants of the array might be violated (for example strict
ordering), one could separate this in two sub points, invariant
violating and non, but probably it is not worth the effort.
3) iterators are fully invalid, and at worst they might iterate on
completely wrong/invalid data without detection of failure, hopefully at
least in debug mode detection should be an option.
These are great thoughts. We need to explore this niche because STL has
a black-and-white approach to invalidation - once invalidated, behavior
is undefined. D could get much more nuanced than that.

Perhaps for the beginning we could just say that invalidation entails
implementation-specific behavior.
Post by Fawzi Mohamed
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Stopping early is invalidation also IMO. If your program logic depends
on iterating over all the elements you originally intended to iterate,
then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
I still fail to see the difference between "soft" operations and
non-soft. What does soft guarantee? Give me a concrete definition,
an example would help too.
I suppose soft should guarantee either 1 or 2, from what andrei was
saying I suppose he wants 2
I think softXxx functions that add data guarantee that existing ranges
continue iterating the new collection correctly. Depending on where
insertion was effected, iteration may or may not span the newly-inserted
elements. For example, inserting elements in a linked list does preserve
ranges so a linked list can afford to define softInsert and softPrepend.

softXxx functions that remove data should guarantee that existing ranges
not including the removed elements will continue to operate on the range
without seeing the removed elements. There is no guarantee that soft
removal offers for ranges spanning the removed elements. (I'm unclear on
this last topic.)

Andrei
Michel Fortin
2010-03-08 18:30:27 UTC
Permalink
On 2010-03-08 13:01:32 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
softXxx functions that remove data should guarantee that existing
ranges not including the removed elements will continue to operate on
the range without seeing the removed elements. There is no guarantee
that soft removal offers for ranges spanning the removed elements. (I'm
unclear on this last topic.)
That won't work for a vector as ranges that are further then the
removed element need adjustment too.

Just an idea, perhaps remove and other mutator functions for the
container could take a list of range to update at the same time they do
the operation. This way you don't even need a soft function, if remove
can't accept other ranges then it won't take other ranges as an
argument. For instance:

class Container {
void remove(Range toRemove, Range[] needAdjustment);
}

Container x = new Conatiner();
auto a = x[];
auto b = x[4..6];
auto c = x[2..4];
x.remove(c, [a, b]); // remove elements from c, adjust ranges a and b.

OtherContainer y = new OtherContainer();
auto d = y[];
auto e = y[2..4];
y.remove(e, [d]); // error, container is unable to adjust other ranges.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
Robert Jacques
2010-03-07 17:43:09 UTC
Permalink
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
Post by Robert Jacques
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range
is
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
positioned on the element removed?
It always affects ranges in so far as the range and container are
inconsistent, but that is a problem of softInsert as well. Removing
an
Post by Steven Schveighoffer
Post by Robert Jacques
element from an array doesn't invalidate the underlying range, since
the memory is still there. And so long as you're not trying to use
free
Post by Steven Schveighoffer
Post by Robert Jacques
lists, linked-lists, trees, etc. can be written so that the ranges
never enter an invalid state. If they happen to be pointing at the
removed node, they just end up stopping early.
If my linked list range has two node pointers as the implementation,
and
Post by Steven Schveighoffer
you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending,
otherwise it's an iterator. Whether you store it via a pointer to the
last (not-to-be-iterated) node, or via a length, or via a value to
compare with for stopping, you have to use something. Or are you
asserting the only useful range for a linked list iterates the entire
list?
Think of it as the equivalent of a slice of an array.
Please define for me an O(1) slice or index operation for a linked-list.
The only way of doing this is to search the entire list in order,
comparing for the search terms or counting the number of nodes passed. The
range itself can do this just as easily as the host container, if you
really want this functionality (I'd argue this isn't a valid list
operation). For simple list[5..10] operations its definitely more
efficient and then the range could even throw an exception when it reaches
the end of the list before finding the end slice value (due to list
topology manipulation). The efficiency of more complex ranges, like
list["apple".."oranges"], is still much more efficient for the single
range case. Even when you start to take many slices, cache effects can
marginalize a lot of the cost.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Of course, you can get around this by just storing a current node and
a
Post by Steven Schveighoffer
length, but that may not be what the user is expecting. For example,
if
Post by Steven Schveighoffer
you did a range of a sorted tree from "a" to "f" and it stops at "n",
I
Post by Steven Schveighoffer
think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
Allowed, yes. Not invalidating iterators, no.
If I change a node in a sorted tree from 'a' to 'n', the node moves,
changing the tree topology. Any range currently at the node would continue
to look for the 'f' node, and iterate the rest of the tree erroneously. By
the way, having ranges detect if they reach their end nodes or not is
fairly easy to do.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Stopping early is invalidation also IMO. If your program logic
depends
Post by Steven Schveighoffer
on iterating over all the elements you originally intended to iterate,
then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
I still fail to see the difference between "soft" operations and
non-soft. What does soft guarantee? Give me a concrete definition, an
example would help too.
There are a couple of possible definitions for soft operations: 1) the
memory safety of the ranges of a collection are guaranteed. 2) That for
the topology viewed by a range isn't logically changed. i.e. the range
will continue to perform the same logical function if the topology its
operating on is updated 3) That for the topology viewed by a range isn't
actually changed and all elements selected at range creation will be
viewed. 4) Like 3, but with all values being viewed.

For example, modifying an array in any way doesn't change 1, 2 or 3 for
any of its slices.
For a linked list defining a forward range, mutation, insertion and
removal can be done under 1 & 2.
The same can be said about doubly linked lists and bidirectional ranges.
For other containers, such as a sorted tree, mutation can break a 2/3
though insertion and deletion don't break 2.
Although, the ranges will see many values, they may not see all the values
currently in the collection nor all the values in the collection when the
iterator was generated. So code that relies on such properties would be
logically invalid.

I'd probably define hard ops as being 1) and soft ops at level 2. 4) is
really only possible with immutable containers.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
The only two containers that would support softInsert would be
linked
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
list and sorted map/set. Anything else might completely screw up
the
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
iteration. I don't see a lot of "generic" use for it.
There's all the containers based upon linked-lists, etc like hashes,
stacks, queues and dequeues.
Hashes may be rehashed when inserting, completely invalidating a range
(possibly the end point will be before the starting point).
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
God no. If my hash collision solution is linked-list based (which it is
in dcollections), why should I reallocate all those nodes? I just
rearrange them in a new bucket array.
Sorry, I was assuming that if you were going to implement a hash
collection you wouldn't be using a linked list approach, since that's what
D's associative arrays already do. The are some really good reasons to not
use a list based hash in D due to GC false pointer issues, but basically
none to re-implementing (poorly?) D's built-in data structure.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Yes, the others you mentioned will be valid. But I still don't see it
being more useful than just using documentation to indicate insertion
will not invalidate ranges. Perhaps I am wrong.
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
What is the advantage? Why would an algorithm require soft functions?
What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Another option is to use a "mutation" field that is checked every
chance by the range. If it changes, then the range is invalidated.
The mutation field would have to be a version number to support
multiple ranges, and given experience with lock-free algorithms which
use a 'tag' in a similar manner, this concept is bug prone and should
not be relied upon. It would be better to 'lock' the node or
container
Post by Steven Schveighoffer
Post by Robert Jacques
to topology changes, though this does slow things down and has no
conflict resolution: removing a locked node would have to throw an
exception.
I was not thinking of multithreaded applications. I don't think it's
worth making containers by default be multithreaded safe.
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
Are you trying to say that if you don't use your range for exactly 2^32
mutations, it could mistakenly think the range is still valid? That's a
valid, but very very weak point.
Umm, no. That is a valid point that happens in production code to
disastrous effects. Worse, it doesn't happen often and there's no good
unit test for it. I for one, never want to debug a program that only
glitches after days of intensive use in a live environment with real
customer data. Integer overflow bugs like this are actually one of the few
bugs that have ever killed anyone.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
The mutation index has been used in Tango forever, and I think was in
Doug Lea's original container implementations. I'm pretty sure it is
sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but is
so unlikely, it's worth the possibility.
Statistics 101: do a test enough times and even the highly improbable will
happen.
Steven Schveighoffer
2010-03-08 03:07:14 UTC
Permalink
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
How do you know when to stop? A range has a beginning and an ending,
otherwise it's an iterator. Whether you store it via a pointer to the
last (not-to-be-iterated) node, or via a length, or via a value to
compare with for stopping, you have to use something. Or are you
asserting the only useful range for a linked list iterates the entire
list?
Think of it as the equivalent of a slice of an array.
Please define for me an O(1) slice or index operation for a linked-list.
One for which you have references to the slice end points. I think this
will work, and I was planning on providing it in the upcoming dcollections
port. The only thing you cannot guarantee is that the order is correct.
Post by Robert Jacques
By the way, having ranges detect if they reach their end nodes or not is
fairly easy to do.
you are correct in that point, you could throw an exception as long as the
end point is part of the range structure. If you just use a current node
plus a length, then you cannot do that. But soft ops are not necessary to
create this.
Post by Robert Jacques
Post by Steven Schveighoffer
I still fail to see the difference between "soft" operations and
non-soft. What does soft guarantee? Give me a concrete definition, an
example would help too.
There are a couple of possible definitions for soft operations: 1) the
memory safety of the ranges of a collection are guaranteed. 2) That for
the topology viewed by a range isn't logically changed. i.e. the range
will continue to perform the same logical function if the topology its
operating on is updated 3) That for the topology viewed by a range isn't
actually changed and all elements selected at range creation will be
viewed. 4) Like 3, but with all values being viewed.
For example, modifying an array in any way doesn't change 1, 2 or 3 for
any of its slices.
For a linked list defining a forward range, mutation, insertion and
removal can be done under 1 & 2.
The same can be said about doubly linked lists and bidirectional ranges.
For other containers, such as a sorted tree, mutation can break a 2/3
though insertion and deletion don't break 2.
Although, the ranges will see many values, they may not see all the
values currently in the collection nor all the values in the collection
when the iterator was generated. So code that relies on such properties
would be logically invalid.
I'd probably define hard ops as being 1) and soft ops at level 2. 4) is
really only possible with immutable containers.
Hard ops definitely qualify as #1, since we are in a GC'd language.

I don't really understand 2, "the range will continue to perform the same
logical function," what does that mean? Define that function. I would
define it as #3, so obviously you think it's something else.

#3 would be most useful for soft operations if it could be guaranteed. I
don't think it can.
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
God no. If my hash collision solution is linked-list based (which it
is in dcollections), why should I reallocate all those nodes? I just
rearrange them in a new bucket array.
Sorry, I was assuming that if you were going to implement a hash
collection you wouldn't be using a linked list approach, since that's
what D's associative arrays already do. The are some really good reasons
to not use a list based hash in D due to GC false pointer issues, but
basically none to re-implementing (poorly?) D's built-in data structure.
Hm... my hash outperforms builtin AAs by a wide margin. But this is not
technically because my implementation is better, it's because AA's use
"dumb" allocation methods. I don't know about false pointers, the hash
nodes in my implementation only contain pointers, so I'm not sure there is
any possibility for false ones.

What my hash implementation *does* do that AA's don't is to provide a
better interface -- removal while iterating, storing cursors for later
reference and removal (i.e. avoid double lookups).

Also, technically AAs are implemented with a tree, not a LL, so no opCmp
is required for my hash.
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
What is the advantage? Why would an algorithm require soft functions?
What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
I guess I won't get a real example. I'm not sure it matters. When Andrei
starts implementing the soft methods, either they will be a huge win or
obviously useless. If I were a betting man, I'd bet on the latter, but
I'm not really good at betting, and Andrei's ideas are usually good :)
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
Are you trying to say that if you don't use your range for exactly 2^32
mutations, it could mistakenly think the range is still valid? That's
a valid, but very very weak point.
Umm, no. That is a valid point that happens in production code to
disastrous effects. Worse, it doesn't happen often and there's no good
unit test for it. I for one, never want to debug a program that only
glitches after days of intensive use in a live environment with real
customer data. Integer overflow bugs like this are actually one of the
few bugs that have ever killed anyone.
I think you are overestimating the life of ranges on a container. They
will not survive that many mutations, and the worst that happens is the
range continues on fully allocated memory (i.e. your #1 above).

You can also force the container to invalidate itself once the first wrap
occurs. This at least will be a hard error.
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but is
so unlikely, it's worth the possibility.
Statistics 101: do a test enough times and even the highly improbable
will happen.
In statistics we generally ignore the outliers. I normally am on your
side in these types of cases, but we are talking about a statistical
impossibility -- nobody will leave a range untouched for exactly 2^32
mutations, it simply won't happen. Your computer will probably be
obsolete before it does.

BTW, I'm not advocating adding mutation counters, I don't plan on adding
them. It's just another alternative to "soft" mutations.

-Steve
Robert Jacques
2010-03-08 05:20:31 UTC
Permalink
On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
[snip]
Post by Steven Schveighoffer
Post by Robert Jacques
Please define for me an O(1) slice or index operation for a linked-list.
One for which you have references to the slice end points. I think this
will work, and I was planning on providing it in the upcoming
dcollections port. The only thing you cannot guarantee is that the
order is correct.
The container would have to do an O(N) search to verify the ranges are
actually part of the collection. And using two ranges as iterators to
create a third range feels very wrong and very bug prone: see all the
issues raised during Andrei's iterators vs ranges presentations.
Similarly, it feels wrong for something to define slicing and not indexing.
Post by Steven Schveighoffer
Post by Robert Jacques
By the way, having ranges detect if they reach their end nodes or not
is fairly easy to do.
you are correct in that point, you could throw an exception as long as
the end point is part of the range structure. If you just use a current
node plus a length, then you cannot do that. But soft ops are not
necessary to create this.
Soft ops are necessary to document in code whether that invalidation is
expected to happen.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
I still fail to see the difference between "soft" operations and
non-soft. What does soft guarantee? Give me a concrete definition,
an example would help too.
There are a couple of possible definitions for soft operations: 1) the
memory safety of the ranges of a collection are guaranteed. 2) That for
the topology viewed by a range isn't logically changed. i.e. the range
will continue to perform the same logical function if the topology its
operating on is updated 3) That for the topology viewed by a range
isn't actually changed and all elements selected at range creation will
be viewed. 4) Like 3, but with all values being viewed.
For example, modifying an array in any way doesn't change 1, 2 or 3 for
any of its slices.
For a linked list defining a forward range, mutation, insertion and
removal can be done under 1 & 2.
The same can be said about doubly linked lists and bidirectional ranges.
For other containers, such as a sorted tree, mutation can break a 2/3
though insertion and deletion don't break 2.
Although, the ranges will see many values, they may not see all the
values currently in the collection nor all the values in the collection
when the iterator was generated. So code that relies on such properties
would be logically invalid.
I'd probably define hard ops as being 1) and soft ops at level 2. 4) is
really only possible with immutable containers.
Hard ops definitely qualify as #1, since we are in a GC'd language.
I don't really understand 2, "the range will continue to perform the
same logical function," what does that mean? Define that function. I
would define it as #3, so obviously you think it's something else.
#3 would be most useful for soft operations if it could be guaranteed.
I don't think it can.
I was thinking of "iterate from the start to the end", for example. One
might better describe this concept as the topology of the container
relative to the range doesn't change: things before it stay before, things
after it stay after and in the case of bidirectional ranges, things in the
middle, stay in the middle.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
God no. If my hash collision solution is linked-list based (which it
is in dcollections), why should I reallocate all those nodes? I just
rearrange them in a new bucket array.
Sorry, I was assuming that if you were going to implement a hash
collection you wouldn't be using a linked list approach, since that's
what D's associative arrays already do. The are some really good
reasons to not use a list based hash in D due to GC false pointer
issues, but basically none to re-implementing (poorly?) D's built-in
data structure.
Hm... my hash outperforms builtin AAs by a wide margin. But this is not
technically because my implementation is better, it's because AA's use
"dumb" allocation methods. I don't know about false pointers, the hash
nodes in my implementation only contain pointers, so I'm not sure there
is any possibility for false ones.
The GC isn't precise, so if you have a non-pointer type in a structure
with a pointer or in a class, you'll get false pointers. (i.e. the hash
value at each node)
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
What is the advantage? Why would an algorithm require soft
functions? What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
I guess I won't get a real example. I'm not sure it matters. When
Andrei starts implementing the soft methods, either they will be a huge
win or obviously useless. If I were a betting man, I'd bet on the
latter, but I'm not really good at betting, and Andrei's ideas are
usually good :)
Though this was the first thing that popped into my head, it is a fairly
real example. Consider you are doing some processing involving a container
and you call a library function, like toUpperCase, that will perform some
mutation. For some containers, this is completely reasonable. But for
others, the container topology is going to massively change, invalidating
all your current ranges. And you really like to guarantee that both the
current implementation and the one 6 months from now don't mess up your
ranges and cause a bunch of exceptions to be thrown.
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
Are you trying to say that if you don't use your range for exactly
2^32 mutations, it could mistakenly think the range is still valid?
That's a valid, but very very weak point.
Umm, no. That is a valid point that happens in production code to
disastrous effects. Worse, it doesn't happen often and there's no good
unit test for it. I for one, never want to debug a program that only
glitches after days of intensive use in a live environment with real
customer data. Integer overflow bugs like this are actually one of the
few bugs that have ever killed anyone.
I think you are overestimating the life of ranges on a container. They
will not survive that many mutations, and the worst that happens is the
range continues on fully allocated memory (i.e. your #1 above).
A year ago I would have agreed with you, but then I saw a couple of
articles about how this unlikely event does start to occur repeatably in
certain circumstances. This made a bunch of news since it threw a massive
spanner in lock-free containers, which relied on this never happening.
Post by Steven Schveighoffer
You can also force the container to invalidate itself once the first
wrap occurs. This at least will be a hard error.
So every container will have a finite number of operation and that's it?
Post by Steven Schveighoffer
Post by Robert Jacques
Post by Steven Schveighoffer
Post by Robert Jacques
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but is
so unlikely, it's worth the possibility.
Statistics 101: do a test enough times and even the highly improbable
will happen.
In statistics we generally ignore the outliers. I normally am on your
side in these types of cases, but we are talking about a statistical
impossibility -- nobody will leave a range untouched for exactly 2^32
mutations, it simply won't happen. Your computer will probably be
obsolete before it does.
No, this is a statistical implausibility. One that has already happened to
some people. ulongs on the other hand...
Post by Steven Schveighoffer
BTW, I'm not advocating adding mutation counters, I don't plan on adding
them. It's just another alternative to "soft" mutations.
I understand.
Steven Schveighoffer
2010-03-08 11:43:11 UTC
Permalink
On Mon, 08 Mar 2010 00:20:31 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
[snip]
Post by Steven Schveighoffer
Post by Robert Jacques
Please define for me an O(1) slice or index operation for a
linked-list.
One for which you have references to the slice end points. I think
this will work, and I was planning on providing it in the upcoming
dcollections port. The only thing you cannot guarantee is that the
order is correct.
The container would have to do an O(N) search to verify the ranges are
actually part of the collection. And using two ranges as iterators to
create a third range feels very wrong and very bug prone: see all the
issues raised during Andrei's iterators vs ranges presentations.
Similarly, it feels wrong for something to define slicing and not indexing.
Not two ranges, two references. That is what cursors are in
dcollections. They currently are akin to iterators, but are being changed
to immovable references.

BTW, I don't plan on adding this as a simple slice operation on a
container that cannot quickly verify that the two cursors are ordered, to
ensure it isn't accidentally used in functions that want safe slicing.

You don't need O(n) to verify the cursors are part of the collection if
the cursors identify the collection they are from.
Post by Robert Jacques
Post by Steven Schveighoffer
Hm... my hash outperforms builtin AAs by a wide margin. But this is
not technically because my implementation is better, it's because AA's
use "dumb" allocation methods. I don't know about false pointers, the
hash nodes in my implementation only contain pointers, so I'm not sure
there is any possibility for false ones.
The GC isn't precise, so if you have a non-pointer type in a structure
with a pointer or in a class, you'll get false pointers. (i.e. the hash
value at each node)
I don't store the hash in the node, it is recalculated when a re-hash is
done.
Post by Robert Jacques
Post by Steven Schveighoffer
I think you are overestimating the life of ranges on a container. They
will not survive that many mutations, and the worst that happens is the
range continues on fully allocated memory (i.e. your #1 above).
A year ago I would have agreed with you, but then I saw a couple of
articles about how this unlikely event does start to occur repeatably in
certain circumstances. This made a bunch of news since it threw a
massive spanner in lock-free containers, which relied on this never
happening.
references?
Post by Robert Jacques
Post by Steven Schveighoffer
You can also force the container to invalidate itself once the first
wrap occurs. This at least will be a hard error.
So every container will have a finite number of operation and that's it?
That's one solution, if you want to be sure this bug doesn't occur, and
you want to be sure your ranges aren't invalidated.

-Steve
Andrei Alexandrescu
2010-03-08 17:53:40 UTC
Permalink
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
What is the advantage? Why would an algorithm require soft
functions? What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
I guess I won't get a real example. I'm not sure it matters. When
Andrei starts implementing the soft methods, either they will be a huge
win or obviously useless. If I were a betting man, I'd bet on the
latter, but I'm not really good at betting, and Andrei's ideas are
usually good :)
The way people usually take advantage of non-invalidating container
primitives is altering the container during an iteration. The canonical
way to remove while iterating an erase-non-invalidating container (e.g.
map) is this:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
c.remove(i++);
else
++i;
}

The canonical way to remove while iterating an erase-invalidating
container (e.g. vector) is:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
i = c.remove(i);
else
++i;
}

I know, subtle :o/. Of course, things can get quickly more interesting
when calling functions from within loops that may affect the container
(think the Observer pattern).


Andrei
Steven Schveighoffer
2010-03-08 18:10:31 UTC
Permalink
On Mon, 08 Mar 2010 12:53:40 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
What is the advantage? Why would an algorithm require soft
functions? What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
I guess I won't get a real example. I'm not sure it matters. When
Andrei starts implementing the soft methods, either they will be a huge
win or obviously useless. If I were a betting man, I'd bet on the
latter, but I'm not really good at betting, and Andrei's ideas are
usually good :)
The way people usually take advantage of non-invalidating container
primitives is altering the container during an iteration. The canonical
way to remove while iterating an erase-non-invalidating container (e.g.
typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
c.remove(i++);
else
++i;
}
The canonical way to remove while iterating an erase-invalidating
typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
i = c.remove(i);
else
++i;
}
Hm... this seems to be a different problem than I originally was thinking
about. So in essence, you want a set of "safe" functions that will not
adversely affect a range you are using to iterate with?

BTW, I solved this problem (removing while iterating) in a much
better/simpler way in dcollections.

-Steve
Andrei Alexandrescu
2010-03-08 18:21:14 UTC
Permalink
Post by Steven Schveighoffer
On Mon, 08 Mar 2010 12:53:40 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford at jhu.edu>
Post by Robert Jacques
On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer
Post by Steven Schveighoffer
What is the advantage? Why would an algorithm require soft
functions? What is an example of such an algorithm?
Something that uses toUpperCase or toLowerCase, for example.
I guess I won't get a real example. I'm not sure it matters. When
Andrei starts implementing the soft methods, either they will be a
huge win or obviously useless. If I were a betting man, I'd bet on
the latter, but I'm not really good at betting, and Andrei's ideas
are usually good :)
The way people usually take advantage of non-invalidating container
primitives is altering the container during an iteration. The
canonical way to remove while iterating an erase-non-invalidating
typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
c.remove(i++);
else
++i;
}
The canonical way to remove while iterating an erase-invalidating
typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
if (should_delete_this_guy)
i = c.remove(i);
else
++i;
}
Hm... this seems to be a different problem than I originally was
thinking about. So in essence, you want a set of "safe" functions that
will not adversely affect a range you are using to iterate with?
That's the idea. Any documentation of STL containers specifies whether
iterators are invalidated or not by a primitive. The plan is to simply
define a nomenclature (i.e. "softXxx") for primitives that don't
invalidate iterators. That moves the documentation into the name of the
function.
Post by Steven Schveighoffer
BTW, I solved this problem (removing while iterating) in a much
better/simpler way in dcollections.
Do tell!


Andrei
Steven Schveighoffer
2010-03-08 18:45:04 UTC
Permalink
On Mon, 08 Mar 2010 13:21:14 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
Hm... this seems to be a different problem than I originally was
thinking about. So in essence, you want a set of "safe" functions that
will not adversely affect a range you are using to iterate with?
That's the idea. Any documentation of STL containers specifies whether
iterators are invalidated or not by a primitive. The plan is to simply
define a nomenclature (i.e. "softXxx") for primitives that don't
invalidate iterators. That moves the documentation into the name of the
function.
What I meant was, your example only focuses on the iterator being advanced
in the loop. Another iterator might be invalidated that is not used in
the remove function. Was that the intent?
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
BTW, I solved this problem (removing while iterating) in a much
better/simpler way in dcollections.
Do tell!
I created a method purge, which is used as an opApply in a foreach loop
like so:

foreach(ref erase, v; &container.purge)
{
erase = should_delete_this_guy;
}

One nifty thing about this, purging an entire collection is an O(n) up to
O(nlgn) operation, whereas purging a vector as you wrote is an O(n*n)
operation.

I found that one of the biggest reasons I liked about STL's containers
over others that I've used is the ability to remove elements from the
container while iterating. Java allows this also, but no foreach, and it
doesn't make optimizations, since the iterator is in charge of removal,
not the container itself.

-Steve
Andrei Alexandrescu
2010-03-08 19:23:22 UTC
Permalink
Post by Steven Schveighoffer
On Mon, 08 Mar 2010 13:21:14 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
Hm... this seems to be a different problem than I originally was
thinking about. So in essence, you want a set of "safe" functions
that will not adversely affect a range you are using to iterate with?
That's the idea. Any documentation of STL containers specifies whether
iterators are invalidated or not by a primitive. The plan is to simply
define a nomenclature (i.e. "softXxx") for primitives that don't
invalidate iterators. That moves the documentation into the name of
the function.
What I meant was, your example only focuses on the iterator being
advanced in the loop. Another iterator might be invalidated that is not
used in the remove function. Was that the intent?
The examples were given just to illustrate common cases when
invalidation is relevant. As I mentioned, interesting cases occur when
long-distance effects occur. The Observer pattern is a perfect example.
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
BTW, I solved this problem (removing while iterating) in a much
better/simpler way in dcollections.
Do tell!
I created a method purge, which is used as an opApply in a foreach loop
foreach(ref erase, v; &container.purge)
{
erase = should_delete_this_guy;
}
Very nifty indeed!
Post by Steven Schveighoffer
One nifty thing about this, purging an entire collection is an O(n) up
to O(nlgn) operation, whereas purging a vector as you wrote is an O(n*n)
operation.
Good point. The loop with erase is good if you have to occasionally
erase some element; for bulk conditional erasures the canonical STL
approach is to use remove() followed by erase():

http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Erase-Remove
Post by Steven Schveighoffer
I found that one of the biggest reasons I liked about STL's containers
over others that I've used is the ability to remove elements from the
container while iterating. Java allows this also, but no foreach, and
it doesn't make optimizations, since the iterator is in charge of
removal, not the container itself.
Yah, the tail that removes the dog. That's one of the reasons I think
Java's containers are missing more points than one. In fact the more I
think of Java containers, the more reasons I find to dislike them. To me
they're more of a point of negative potential in the design space that
needs to be carefully navigated away from.


Andrei

dsimcha
2010-03-08 03:22:04 UTC
Permalink
== Quote from Robert Jacques (sandford at jhu.edu)'s article
Post by Robert Jacques
Post by Steven Schveighoffer
The mutation index has been used in Tango forever, and I think was in
Doug Lea's original container implementations. I'm pretty sure it is
sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Using a 32-bit version tag probably could lead to overflow in some corner cases,
but in the 64-bit case it can be shown that this is absurdly unlikely. Assume
that an increment operation takes about 1 clock cycle (in reality it's more) and
that most CPUs are 10 GHz (to make things future-proof in case people figure out
how to get the clock speed race going again).

This means that a version tag could, at most, increment at about 10^10 ticks per
second. A 64-bit unsigned int goes up to about 1.8x10^19. This means that, even
if our program does nothing but update the version counter in an empty for loop,
it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow. There are about
3.15x10^7 seconds in a year.

Therefore, even under very pessimistic assumptions the fastest you could overflow
a 64-bit unsigned int by incrementing it starting at zero is in about half a century.
Andrei Alexandrescu
2010-03-08 17:36:32 UTC
Permalink
Post by dsimcha
== Quote from Robert Jacques (sandford at jhu.edu)'s article
Post by Robert Jacques
Post by Steven Schveighoffer
The mutation index has been used in Tango forever, and I think was in
Doug Lea's original container implementations. I'm pretty sure it is
sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
Using a 32-bit version tag probably could lead to overflow in some corner cases,
but in the 64-bit case it can be shown that this is absurdly unlikely. Assume
that an increment operation takes about 1 clock cycle (in reality it's more) and
that most CPUs are 10 GHz (to make things future-proof in case people figure out
how to get the clock speed race going again).
This means that a version tag could, at most, increment at about 10^10 ticks per
second. A 64-bit unsigned int goes up to about 1.8x10^19. This means that, even
if our program does nothing but update the version counter in an empty for loop,
it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow. There are about
3.15x10^7 seconds in a year.
Therefore, even under very pessimistic assumptions the fastest you could overflow
a 64-bit unsigned int by incrementing it starting at zero is in about half a century.
Yah, there are many similar assessments that at least for a couple of
centuries going we can assume that 64-bit counters never overflow and
64-bit virtual address space is sparse. First time I saw this was when I
read about the Opal OS:
http://www.cs.washington.edu/homes/levy/opal/sigops92.ps

Andrei
Steve Teale
2010-03-06 17:03:20 UTC
Permalink
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
Sounds good?
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
Another option is to use a "mutation" field that is checked every chance
by the range. If it changes, then the range is invalidated.
-Steve
Steve,

Wouldn't the soft things throw (or maybe just nudge) an exception if the
container was in a state where the soft option was not appropriate.

That would be the other way round to what you said, The container would
note when some range operation had a conflicting currency.

Steve
Steven Schveighoffer
2010-03-07 02:38:43 UTC
Permalink
Post by Steve Teale
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
Sounds good?
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
Another option is to use a "mutation" field that is checked every chance
by the range. If it changes, then the range is invalidated.
-Steve
Steve,
Wouldn't the soft things throw (or maybe just nudge) an exception if the
container was in a state where the soft option was not appropriate.
That would be the other way round to what you said, The container would
note when some range operation had a conflicting currency.
I think the point of soft functions is to not compile if they are not supported by the container. Can you think of a way a container can have a soft option that is appropriate sometimes, but not others (thereby enabling a runtime option)? I can't really see it, but maybe I'm missing it.

-Steve
Andrei Alexandrescu
2010-03-06 19:55:49 UTC
Permalink
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual organization
that is weaker than that of iterators.
A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.
In addition to those, a container may also define functions named after
the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
that are guaranteed to not affect the ranges currently iterating the
container.
Generic code that needs specific iterator (non-)invalidation rules can
use softXxx methods, in confidence that containers not supporting it
will be ruled out during compilations.
Sounds good?
How can softRemove not affect iterating ranges? What if the range is positioned on the element removed?
With GC, you can softRemove things without invalidating iterators.
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked list and sorted map/set. Anything else might completely screw up the iteration. I don't see a lot of "generic" use for it.
Singly linked lists, doubly linked lists, various trees - three's a
crowd. Most node-based containers support softInsert.


Andrei
Michel Fortin
2010-03-06 21:27:04 UTC
Permalink
On 2010-03-06 14:55:49 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
With GC, you can softRemove things without invalidating iterators.
What exactly is an "invalidated" iterator? Is an iterator that no
longer points to the container still "valid"? Or is "valid" just
another word for "memory safe"?

Wouldn't the notion of "detached" iterators/ranges be more useful?
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
Andrei Alexandrescu
2010-03-06 22:18:27 UTC
Permalink
Post by Michel Fortin
On 2010-03-06 14:55:49 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
With GC, you can softRemove things without invalidating iterators.
What exactly is an "invalidated" iterator? Is an iterator that no longer
points to the container still "valid"? Or is "valid" just another word
for "memory safe"?
Wouldn't the notion of "detached" iterators/ranges be more useful?
Good question. In STL, invalidation roughly means undefined behavior if
you use it. With GC in tow, the concept could be significantly milder.
For example, reallocating an array would leave the old contents of the
array sort of around, just obsoleted and also depleted of meaningful
content if the data type has a destructor.

Andrei
Joel C. Salomon
2010-03-08 17:44:29 UTC
Permalink
Post by Andrei Alexandrescu
Good question. In STL, invalidation roughly means undefined behavior if
you use it. With GC in tow, the concept could be significantly milder.
For example, reallocating an array would leave the old contents of the
array sort of around, just obsoleted and also depleted of meaningful
content if the data type has a destructor.
So it?s still a bug to use an invalidated iterator/range, and probably
still has unusual (if no longer undefined) effects if you do follow it,
but it (probably) won?t cause memory corruption. Is that correct?

?Joel Salomon
Andrei Alexandrescu
2010-03-08 18:10:24 UTC
Permalink
Post by Joel C. Salomon
Post by Andrei Alexandrescu
Good question. In STL, invalidation roughly means undefined behavior if
you use it. With GC in tow, the concept could be significantly milder.
For example, reallocating an array would leave the old contents of the
array sort of around, just obsoleted and also depleted of meaningful
content if the data type has a destructor.
So it?s still a bug to use an invalidated iterator/range, and probably
still has unusual (if no longer undefined) effects if you do follow it,
but it (probably) won?t cause memory corruption. Is that correct?
I'd think so. In SafeD, clearly there's no memory corruption so throwing
or iterating over removed elements would be valid choices. In non-safe
mode, the user should expect undefined behavior.

Andrei
BLS
2010-03-07 00:41:57 UTC
Permalink
Post by Andrei Alexandrescu
Singly linked lists, doubly linked lists, various trees - three's a
crowd. Most node-based containers support softInsert.
slightly OT
I am pretty sure that you have to implement IsRangeOfRange! and
SubRange! for non linear collections. (if not already done)

Guess what I want to say is that I doubt that std.algo will fit in every
case; and creating a collection lib reduced to what is possible under
the std.algo umbrella is (IMHO) a bad idea.

..and frankly said I wonder .. no offense.. why you are the (only) one
who is creating and designing the collection library ?
finally :
from a pragmatic view > why create from scratch instead of reusing
Stevens data-structures..
Bjoern
Steven Schveighoffer
2010-03-07 03:13:16 UTC
Permalink
Post by BLS
..and frankly said I wonder .. no offense.. why you are the (only) one
who is creating and designing the collection library ?
from a pragmatic view > why create from scratch instead of reusing
Stevens data-structures..
We have had private discussions, and we have fundamental differences of
opinion for what belongs in containers. It may be that through trial and
error and constructive discussion we come up with the same conclusions, in
which case I'm sure we will combine forces (recently, I think we both
moved our points of view closer, but not quite there yet). However, I
will not stop maintaining dcollections as long as Phobos' collection
solution does not implement the features I think are important. I am in
the process of porting/improving dcollections to D2, and when it is done,
I think it will have some really useful features in it.

-Steve
Andrei Alexandrescu
2010-03-07 11:26:22 UTC
Permalink
Post by Steven Schveighoffer
Post by BLS
..and frankly said I wonder .. no offense.. why you are the (only) one
who is creating and designing the collection library ?
from a pragmatic view > why create from scratch instead of reusing
Stevens data-structures..
We have had private discussions, and we have fundamental differences of
opinion for what belongs in containers. It may be that through trial
and error and constructive discussion we come up with the same
conclusions, in which case I'm sure we will combine forces (recently, I
think we both moved our points of view closer, but not quite there
yet).
I concur with Steve's characterization of the situation. My interface
definitions are very crisp and my cost function looks more like a step
function, so I'm difficult to negotiate with. I'm running a mile away
faster than you can say "Here, there's this Container.contains() method,
which..."

But Steve suggested that he could volunteer, with credit, some of his
implementation code to my interfaces, even though he has disagreements
regarding the exact definitions of those interfaces.
Post by Steven Schveighoffer
However, I will not stop maintaining dcollections as long as
Phobos' collection solution does not implement the features I think are
important. I am in the process of porting/improving dcollections to D2,
and when it is done, I think it will have some really useful features in
it.
That's great. This kind of competition can only be best for everyone.


Andrei
Steven Schveighoffer
2010-03-07 03:05:23 UTC
Permalink
On Sat, 06 Mar 2010 14:55:49 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual organization
that is weaker than that of iterators.
A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.
In addition to those, a container may also define functions named
after the above by adding a "soft" prefix (e.g. softInsert,
softRemove...) that are guaranteed to not affect the ranges currently
iterating the container.
Generic code that needs specific iterator (non-)invalidation rules can
use softXxx methods, in confidence that containers not supporting it
will be ruled out during compilations.
Sounds good?
How can softRemove not affect iterating ranges? What if the range is
positioned on the element removed?
With GC, you can softRemove things without invalidating iterators.
If you define invalidation by "not pointing to unallocated memory,"
wouldn't normal remove (not soft remove) also result in valid memory being
pointed to? Why do we need a special soft version?

Even if this is what you mean, with allocators (which significantly speed
up container insert/removal), you may be pointing to a newly valid piece
of memory that may not be where you want.

I thought you meant by not invalidating that the range would iterate over
the elements it was originally scheduled to iterate over, sans the removed
element.
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
Singly linked lists, doubly linked lists, various trees - three's a
crowd. Most node-based containers support softInsert.
Hashes may rehash on insert, invalidating any ranges (it's one of the
notes of dcollections' HashMap and HashSet). I guess it depends on if an
insert alters the ordering of the nodes. This is why I said sorted
containers which should be unaffected.

You have a point that this covers a lot of containers however, but I think
the usefulness of just having a softInsert (I think softRemove is not a
useful concept) is pretty limited. It limits you to

BTW, I think its a good idea to think about how to make this work, if we
can come up with something that covers a lot of ground, I think that's a
good thing. I thought of making ranges slower but more robust to
container changes in debug mode, does this sound useful? I.e. if you
compile in non-release mode, removing/adding an element may trigger the
range to do a O(n) check to validate itself.

-Steve
Andrei Alexandrescu
2010-03-07 11:42:10 UTC
Permalink
Post by Steven Schveighoffer
On Sat, 06 Mar 2010 14:55:49 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
In the STL world, writing container-independent code is generally
shunned (see e.g.
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
One problem is a very small intersection between the functionalities
offered by the various STL containers, and the conceptual
organization that is weaker than that of iterators.
A worse problem is iterator invalidation rules, something that we'll
need to address too. I'm thinking that the best defense is a strong
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must
be handled in user code as such.
In addition to those, a container may also define functions named
after the above by adding a "soft" prefix (e.g. softInsert,
softRemove...) that are guaranteed to not affect the ranges
currently iterating the container.
Generic code that needs specific iterator (non-)invalidation rules
can use softXxx methods, in confidence that containers not
supporting it will be ruled out during compilations.
Sounds good?
How can softRemove not affect iterating ranges? What if the range
is positioned on the element removed?
With GC, you can softRemove things without invalidating iterators.
If you define invalidation by "not pointing to unallocated memory,"
wouldn't normal remove (not soft remove) also result in valid memory
being pointed to? Why do we need a special soft version?
Well consider an array of File objects, which are reference counted. I'd
want the resizing of an array to not increment the reference count of
the files. That means that resizing leaves the old array without
meaningful content by calling clear() against the remaining files.
Post by Steven Schveighoffer
Even if this is what you mean, with allocators (which significantly
speed up container insert/removal), you may be pointing to a newly valid
piece of memory that may not be where you want.
I thought you meant by not invalidating that the range would iterate
over the elements it was originally scheduled to iterate over, sans the
removed element.
For remove that should be the case, but probably not for insert.
Post by Steven Schveighoffer
Post by Andrei Alexandrescu
Post by Steven Schveighoffer
The only two containers that would support softInsert would be linked
list and sorted map/set. Anything else might completely screw up the
iteration. I don't see a lot of "generic" use for it.
Singly linked lists, doubly linked lists, various trees - three's a
crowd. Most node-based containers support softInsert.
Hashes may rehash on insert, invalidating any ranges (it's one of the
notes of dcollections' HashMap and HashSet). I guess it depends on if
an insert alters the ordering of the nodes. This is why I said sorted
containers which should be unaffected.
You have a point that this covers a lot of containers however, but I
think the usefulness of just having a softInsert (I think softRemove is
not a useful concept) is pretty limited. It limits you to
Iterator invalidation has been a notion thoroughly explored by the STL
and a commonly-mentioned liability of STL iterators. People find it very
jarring that syntactically identical interfaces have distinct effects on
iterators, it's a dependency very difficult to track. I'm glad that that
experience has already been accumulated and that we can build upon it.
Post by Steven Schveighoffer
BTW, I think its a good idea to think about how to make this work, if we
can come up with something that covers a lot of ground, I think that's a
good thing. I thought of making ranges slower but more robust to
container changes in debug mode, does this sound useful? I.e. if you
compile in non-release mode, removing/adding an element may trigger the
range to do a O(n) check to validate itself.
I think debug vs. release should not affect complexity.


Andrei
Steven Schveighoffer
2010-03-08 03:17:13 UTC
Permalink
On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Iterator invalidation has been a notion thoroughly explored by the STL
and a commonly-mentioned liability of STL iterators. People find it very
jarring that syntactically identical interfaces have distinct effects on
iterators, it's a dependency very difficult to track. I'm glad that that
experience has already been accumulated and that we can build upon it.
Is there any experience that led to a solution, even in theory?

I'm going to stop arguing this point any further, because the quickest
path to resolution is probably for you to stop wasting time debating with
me and implement what you think will work. Then we can see how useful it
might be.

If you can solve this problem in a way that is useful, I think it might be
as revolutionary as ranges are.

-Steve
Andrei Alexandrescu
2010-03-08 17:40:30 UTC
Permalink
Post by Steven Schveighoffer
On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Iterator invalidation has been a notion thoroughly explored by the STL
and a commonly-mentioned liability of STL iterators. People find it
very jarring that syntactically identical interfaces have distinct
effects on iterators, it's a dependency very difficult to track. I'm
glad that that experience has already been accumulated and that we can
build upon it.
Is there any experience that led to a solution, even in theory?
I don't know of one. There are of course the safe iterators that rely on
a variety of techniques (some, such as serial numbers, discussed in this
thread) to detect invalidation dynamically.

For the style of containers discussed herein, a static solution should
be a better fit than a dynamic one. I plan to enact a solution by means
of the softXxx primitives.
Post by Steven Schveighoffer
I'm going to stop arguing this point any further, because the quickest
path to resolution is probably for you to stop wasting time debating
with me and implement what you think will work. Then we can see how
useful it might be.
If you can solve this problem in a way that is useful, I think it might
be as revolutionary as ranges are.
Probably not. The way I see it, it would be simply the removal of a
large annoyance that stands in the way of writing container-independent
generic code.


Andrei
Fawzi Mohamed
2010-03-07 12:19:14 UTC
Permalink
I am coming late into this discussion, some comments:

I like a duck typing approach whenever possible, and don't need runtime
adaptability (which seems to be in line with others)

different container have different trade offs, and switching between
them should be a design choice

Generic algorithm in my opinion are useful to easily (and with little
well tested code) give a wise range interface to several containers.
They should not "forbid" special implementation of some algorithms in a
container.
To have a common interface one can always use a trampoline function,
that looks for the specialized implementation in various places.
I know that in theory one should be able to write specialized templates
instead, but in practice I have often encountered bugs and strange
behavior, in container implementation seems much cleaner to me

Different containers have different "threadsafeness". Normally some
subset of operations is threadsafe, but mixing in other might break it.
This in my opinion *must* be documented, for example one should not
have to look to the source to know if addition of an element to an AA
breaks a foreach loop or not.
Threadsafeness has a cost, in general non threadsafe structures are
faster, so it makes sense to have them (especially thinking that often
even in the non threadsafe structures one can use a subset of
operations concurrently without problems.
There is a large class of structures that *are* threadsafe, these have
some beautiful properties, and are very useful to know. A very good
review of them (I don't say introduction because the book is quite
advanced) is "Purely Functional Data Structures" by Chris Okasaki. In
these structures the values read are always the same, but lazy
evaluation might be used to transform amortized cost in non amortized
cost.

I had already spoken about these back during the const discussion,
because I would have liked that these structures were "immutable" and
usable by pure functions, which meant a slighlty different thing from
immutable. Some casting might be used to implement them, if one assumes
that casting a mutable structure to immutable and modifying it later is
doable (no compiler optimization on immutable disallow it, this could
be done even just for a special "thunk" that contain a lazy function
that will produce the value). Anyway that is another discussion.

For the current discussion the important thing is that yes, there are
fully threadsafe structures, and it would be useful to have them, but
in general they have a slightly larger cost.
Not safe structures are also useful as they are faster, in general in
the documentation one expects to find which operations are safe and
which not.
The minima baseline typically is that just reading is threadsafe,
modifications in general aren't and might invalidate all pointers to
the structure.
Exceptions to this should be clearly stated, for example appending to
an array can be made threadsafe without a large cost, whereas insertion
(if the update of an element is not atomic) cannot.

If D really wants to go after multithreading it makes sense to offer as
many safe structures as possible, and in any case clearly document the
non threadsafeness of the others.

it might make sense to give "safe" interfaces (I would say basically
just reading without updates), but one cannot expect to really express
all threadsafeness of a container though interfaces, because that would
be too complicated and then one would end up with one container per
interface without much gain.

Fawzi
Michel Fortin
2010-03-06 14:46:13 UTC
Permalink
On 2010-03-06 07:49:18 -0500, Andrei Alexandrescu
Post by Andrei Alexandrescu
Methods such as insert, remove, pushFront, pushBack, removeFront,
removeBack, are assumed to affect the container's topology and must be
handled in user code as such.
I think it'd be better to say "they are assumed to invalidate linked
ranges". I know it's the same thing, but I think it's preferable to
state the effect on the container's interface than the effects on the
container's internals.

Side note: me thinks it'd be better to rename pushFront and pushBack to
insertFront and insertBack. One reason for this is that "push" is often
seen as the opposite of "pop" in container parlance while in our case
it's not. Also, "insert" is a more natural opposite for "remove". And
you're already using the "insert" term standalone anyway when it's not
accompanied by "Front" or "Back".
Post by Andrei Alexandrescu
In addition to those, a container may also define functions named after
the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
that are guaranteed to not affect the ranges currently iterating the
container.
I think that's a good idea. The problem is that you don't necessarily
know when writing an algorithm whether you are authorized or not to
invalidate iterators. Wouldn't it be more useful if it worked with a
wrapper instead?

auto range = container[];
playWithContainer(container.softwrap); // playWithContainer can't
invalidate range
playWithRange(range); // range is still valid here

The soft wrapper could be build automatically from compile time
reflection by looking at which soft methods are available.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
Continue reading on narkive:
Loading...