Even worse, the responsibilities of different parts of the program with regard to pointer correctness are not always obvious. This often becomes a point of contention when those different parts were developed by different individuals. This paper attempts to define a safe and general pointer discipline that can be used to resolve such issues.
In C++, a null pointer is obtained by casting 0 into the pointer's type. In Perl, a null pointer is any false scalar (i.e. "", 0, or undef).
For our purposes, a C++ reference is just a dumb pointer that is constant and non-null.
A scoped pointer is a just clone pointer that does not provide copy or assignment services. An auto pointer is just a clone pointer whose copy and assignment services have ownership transferring semantics.
For our purposes, a Perl reference is just a shared pointer.
Similarly, don't dereference an iterator that has already iterated through its entire container. (Such an iterator is not necessarily null. For example, an exhausted C++ STL iterator is given by coll.end(), but it can still access coll.)
For our purposes, an object contained within another object on the heap is not considered on the heap itself.
int main() {
shared_ptr<Foo> foop(new Foo);
shared_ptr<Foo> barp(foop); // OK fine
shared_ptr<Foo> fubarp(&*foop); // Legal, but WRONG!
}
This also makes it difficult to obtain a shared pointer when all you have is a dumb back pointer. This can be solved with the FACTORY METHOD pattern, provided that the shared_ptr class has a member base class that you can use for that purpose. However, be careful never to generate shared pointers to a pre-existing object that isn't already a shared pointer referent! For example:
template<typename T>
class shared_ptr {
private:
T *ref_;
long *count_;
bool generated_;
public:
// NOTE: This ctor's argument must come directly from new
explicit shared_ptr(T*) : generated_(false) {
// ...
}
~shared_ptr() {
if(--*count_ == 0) {
if(!generated_) { delete count_; }
delete ref_;
}
}
// And other stuff...
class Factory {
friend class shared_ptr;
mutable long count_;
public:
Factory() : count_(0) {}
Factory(const Factory&) : count_(0) {}
Factory &operator=(const Factory&) { return *this; }
// Implicit destructor is fine.
shared_ptr pointer() const { return *this; }
bool shared() const { return count_; }
};
private:
explicit shared_ptr(const Factory &obj) :
ref_(&obj), count_(&obj.count_), generated_(true)
{ *count_++; }
}
class Foo : public shared_ptr<Foo>::Factory {
// ...
}
int main() {
shared_ptr<Foo> foop=(new Foo)->pointer();
clone_ptr<Foo> barp(new Foo); // Don't call barp->pointer() !
}
You might think that you can make this safe by breaking the loop in the destructor. However, if that's true, then you could and should be using a dumb back pointer instead.
You can avoid this problem by requiring all smart pointer loops to be owned by some active scope that is responsible for breaking the loop before it expires (even if it expires due to an exception). The owning scope may be an object, provided that there is no path of smart pointers to it from any of the smart pointer loops that it is responsible for breaking.
If the language of choice has dumb pointers, then you could instead solve the problem by replacing a smart pointer in the loop whose referent is otherwise still reachable through smart pointers with a dumb back pointer.
However, the copy semantics of auto pointers break the notion of a well-formed class, since you can't assume in general that an object will go out of scope just after being copied. You can fix this by defining your own copy constructor and assignment operator, but then it's safer to use a scoped pointer instead. Therefore, don't include non-static auto pointer members in a class.
If you follow the preceding rules, then most of your dumb pointers will live inside smart pointers, and the smart pointer classes will do most of the work for you. However, if the immediate scope of the pointer is an object, and the object's class does not fully accept responsibility for its dumb pointers (as is the case with STL iterator objects in C++), then the scope(s) enclosing the object (or the object's smart referrers, if it's on the heap) inherit any remaining responsibilities. A class's comments should always point out any such unusual responsibilities delegated to its clients.
There may be other ways to invalidate an iterator in addition to destroying its immediate referent, depending on the requirements of the specific kind of iterator that you're using. Enclosing scopes are responsible for ensuring that iterators are not invalidated (or at least that invalid iterators aren't dereferenced, if they are legal).
In general, you'll want to localize these responsibilities as much as possible without substantially impacting efficiency. In particular, the entire program must be analyzed in order to determine whether a non-const global dumb pointer can be compromised.
Note that having the pointer valid as a precondition is not necessarily sufficient. For example:
int main() {
Foo referent;
func(referent, auto_ptr(&referent));
}
void func(Foo &dumb, auto_ptr smart) {
{
auto_ptr temp=smart; // temp owns *smart now
// *smart deallocated here
}
dumb.method(); // Boom! Can't use a deallocated object!
}
The client is normally required to understand the semantics of the function call well enough to guarantee that actions taken by the function on the client's behalf do not destroy passed dumb pointer referents. If the function is guaranteed to be robust against such a deallocation, that should be documented in the function's comments, such that modifications to the function won't violate the guarantee.
Invalid pointer parameter problems can sometimes be addressed by passing the referent by value instead of pointer. In C++, changing a parameter between reference and value semantics is syntactically transparent, which makes the change inexpensive from a programming standpoint.
On the other hand, whether a function guarantees valid pointer results when there is an invalid pointer parameter, assuming that the function specifically allows that at all, is entirely up to the function.
In a typical case, a back pointer is initialized in the object's constructor, which is called by the owner's constructor. Since the owner/referent's constructor hasn't yet returned, the back pointer is invalid inside its object's constructor. The same issue crops up during destruction.
Dereferencing a referrer of a partially constructed object from outside of the object's class is actually legal, but it's not a good idea because it implicitly relies on the order of the referent's initialization, which (like any other aspect of the class's implementation) is generally subject to change without notice. Furthermore, the fact that the order of initialization even matters is obscure. If a client really does need to rely on initialization order properties, then those properties should at least be spelled out in the class's comments.
The recommended approach for avoiding this problem is to explicitly call out any methods that are not guaranteed not to dereference the back pointer (for example, with a /*back*/ comment), and avoid calling those routines from any of its constructors or its destructor. The owner's constructor or destructor may call such methods, provided that the owner is in a fully coherent state when that happens. You can compensate for the fact that the owner is invisible during construction and destruction by passing additional arguments to the constructor, and having the owner take responsibility for initiating actions that it is required to undertake on behalf of the object's constructor or destructor.
It is good practice to verify that the back pointer actually points to the object's owner before dereferencing it, at least when assertions are enabled. It is also worth mentioning that the copy constructor of a class with a back pointer generally requires an additional argument to indicate the new owner, and therefore the owner's implicit copy constructor won't work.
Anders Johnson, last modified $Date: 2003/01/08 $