[ Home | Resume | Programming | Engineering Philosophy | Family ]

A Pointer Discipline

Pointers are a very useful programming construct. However, they are also error prone. For example, the referent of a pointer may be deallocated without the pointer's knowledge, or the last pointer to a referent may be erased without deallocation the referent. Such problems are especially likely to arise after an exception is thrown, because the programmer cannot be expected to consider all the possible control paths through the program under exceptional conditions.

Even worse, the responsibilities of different parts of the program with regard to pointer correctness are not always obvious. This often becomes a point of contention when those different parts were developed by different individuals. This paper attempts to define a safe and general pointer discipline that can be used to resolve such issues.

Pointer Terminology

Objects

For our purposes, an object is defined as anything that consumes program memory, whether or not its type is a class. A first object is said to contain a second object if and only if the first object's scope encloses the second object, and therefore the memory associated with the second object is a (possibly improper) subset of the memory associated with the first object. A function is considered an object in the global/static scope.

Referrer vs. Referent

An object's referrer is a pointer that points to the object. A pointer's referent is not (necessarily) a pointer, but rather the object to which a pointer points. To dereference a pointer means to access its referent. A pointer that has no referent is said to be null.

In C++, a null pointer is obtained by casting 0 into the pointer's type. In Perl, a null pointer is any false scalar (i.e. "", 0, or undef).

Dumb Pointers

A dumb pointer is just that -- dumb. Its referent is not necessarily guaranteed to be alive, and it never deletes its referent. The client of the pointer is generally responsible for nullifying the pointer when its referent dies, and ensuring that memory is not leaked.

For our purposes, a C++ reference is just a dumb pointer that is constant and non-null.

Clone Pointers

A clone pointer is a smart pointer that has strict ownership of its referent. When the clone pointer is destroyed, the referent is destroyed and deallocated. When a clone pointer is assigned or copied, the referent is cloned. The name comes from the fact that clone pointer referents are generally required to have a Clone method that returns a copy of its exact type, as opposed to a copy of its abstract type.

A scoped pointer is a just clone pointer that does not provide copy or assignment services. An auto pointer is just a clone pointer whose copy and assignment services have ownership transferring semantics.

Shared Pointers

A shared pointer is a smart pointer that has shared ownership of its referent. When a shared pointer is destroyed, the copy count of the referent is decremented. When the referent's last shared pointer is destroyed, the referent is destroyed and deleted. When a shared pointer is assigned or copied, the copy count of the referent is incremented.

For our purposes, a Perl reference is just a shared pointer.

Weak Pointers

A weak pointer is similar to a shared pointer, except that it does not affect the referent's copy count. That is, the copy count of the referent can become zero while there are outstanding weak pointer referrers. When that happens, those weak pointers are automatically set to null. For our purposes, a weak pointer is considered dumb, even though it does have enough intelligence not to point to a nonexistent referent.

Back Pointers

An object's owner is defined as a second object having exclusive or shared responsibility for deallocating the first object when it is no longer needed. A back pointer is a pointer from an object to any one of its owners, or any one of its owner's owners, etc. A back pointers may be either smart or dumb, but as we'll see, it is an error for it to be a clone pointer.

Iterators

An iterator is a pointer to an object in a collection (such as an array or a linked list) that can be advanced (i.e. made to refer to another object in the collection). For our purposes, the "referent" of an iterator comprises the set of elements that are reachable by the iterator (usually the entire container). An iterator is usually a dumb pointer, but in principle it might be a shared pointer.

Don't Dereference a Null Pointer

Hopefully, this goes without saying. It is perfectly legal for pointers to be null in general, so be prepared for that, and do something that makes sense when it happens. If a pointer is required to be non-null at a certain point in the code, then assert that requirement as early as possible.

Similarly, don't dereference an iterator that has already iterated through its entire container. (Such an iterator is not necessarily null. For example, an exhausted C++ STL iterator is given by coll.end(), but it can still access coll.)

Distinguish Iterators

Don't advance a pointer that isn't an iterator. (In particular, that's legal in C and C++, but don't do it.) Don't pass a non-iterator to a service that expects an iterator. Don't advance a null iterator. Don't pass an non-null exhausted iterator to a service that expects a pointer.

Objects on the Heap Should Have at Least One Smart Referrer

This ensures that you never have an object without any referrers. However, as we'll see later, this alone does not preclude memory leaks.

For our purposes, an object contained within another object on the heap is not considered on the heap itself.

Smart Pointers Should Refer Only to Objects on the Heap

Objects that don't live on the heap are either static/global or part of a stack frame. Because such objects are deleted automatically when they go out of scope, you can't delete them through pointers. Therefore, it doesn't make any sense for a smart pointer to refer to them.

A Clone Pointer Referent Should Have No Other Smart Referrer

A clone pointer is solely responsible for deleting its referent, so it doesn't make any sense for the referent also to have shared pointers. For example, if the clone pointer is destroyed before the last shared pointer, then the shared pointers will be corrupted, and vice versa.

An Object's Shared Referrers Should be Coherent

In order for shared pointers and weak pointers to work, a referent must have exactly one copy count. Unfortunately, the copy count belongs to the shared pointers, and not the referent. Therefore, only the initial shared pointer that is created after the referent is created should be constructed from a dumb pointer. In Perl, Java and Lisp, this doesn't matter, because you can't obtain access to an object without a shared pointer. However, in C++ you need to be more careful. For example:
	int main() {
		shared_ptr<Foo> foop(new Foo);
		shared_ptr<Foo> barp(foop); // OK fine
		shared_ptr<Foo> fubarp(&*foop); // Legal, but WRONG!
	}

This also makes it difficult to obtain a shared pointer when all you have is a dumb back pointer. This can be solved with the FACTORY METHOD pattern, provided that the shared_ptr class has a member base class that you can use for that purpose. However, be careful never to generate shared pointers to a pre-existing object that isn't already a shared pointer referent! For example:

	template<typename T>
	class shared_ptr {
	 private:
		T *ref_;
		long *count_;
		bool generated_;
	 public:
		// NOTE: This ctor's argument must come directly from new
		explicit shared_ptr(T*) : generated_(false) {
			// ...
		}
		~shared_ptr() {
			if(--*count_ == 0) {
				if(!generated_) { delete count_; }
				delete ref_;
			}
		}
		// And other stuff...
		class Factory {
			friend class shared_ptr;
			mutable long count_;
		 public:
			Factory() : count_(0) {}
			Factory(const Factory&) : count_(0) {}
			Factory &operator=(const Factory&) { return *this; }
			// Implicit destructor is fine.
			shared_ptr pointer() const { return *this; }
			bool shared() const { return count_; }
		};
	  private:
		explicit shared_ptr(const Factory &obj) :
		  ref_(&obj), count_(&obj.count_), generated_(true)
		{ *count_++; }
	}
	class Foo : public shared_ptr<Foo>::Factory {
		// ...
	}
	int main() {
		shared_ptr<Foo> foop=(new Foo)->pointer();
		clone_ptr<Foo> barp(new Foo); // Don't call barp->pointer() !
	}

A Loop of Clone Pointers is Prohibited

A loop of clone pointers is very evil, because it leads to infinite recursion. For example, if A owns B and B owns A, then when you destroy A you must first destroy B, which must first destroy A, and so on.

You might think that you can make this safe by breaking the loop in the destructor. However, if that's true, then you could and should be using a dumb back pointer instead.

A Loop of Smart Pointers Should Have an Owner

When there is a loop of smart pointers, you can still leak memory, because each object in the loop will still have a referrer when the last path of access to the loop is destroyed.

You can avoid this problem by requiring all smart pointer loops to be owned by some active scope that is responsible for breaking the loop before it expires (even if it expires due to an exception). The owning scope may be an object, provided that there is no path of smart pointers to it from any of the smart pointer loops that it is responsible for breaking.

If the language of choice has dumb pointers, then you could instead solve the problem by replacing a smart pointer in the loop whose referent is otherwise still reachable through smart pointers with a dumb back pointer.

Objects Should Not Contain Auto Pointers

Auto pointers are useful for local scopes because, for example, an assignment of auto pointers saves a copy construction and a destruction when the right-hand side auto pointer is going to go out of scope before its next use anyway.

However, the copy semantics of auto pointers break the notion of a well-formed class, since you can't assume in general that an object will go out of scope just after being copied. You can fix this by defining your own copy constructor and assignment operator, but then it's safer to use a scoped pointer instead. Therefore, don't include non-static auto pointer members in a class.

Don't Deallocate Objects Explicitly

That's what scopes and smart pointers are for, so let them do their job. Otherwise, you'll probably wind up destroying the same object multiple times, which is a big no-no.

Enclosing Scopes are Responsible for Dumb Pointers

In general, the responsibility for ensuring that a non-null dumb pointer refers to the same live object as when it was last assigned lies with the enclosing scope(s). For example, this is guaranteed whenever the referent of the dumb pointer belongs to an enclosing scope, either as a local datum or through smart pointers.

If you follow the preceding rules, then most of your dumb pointers will live inside smart pointers, and the smart pointer classes will do most of the work for you. However, if the immediate scope of the pointer is an object, and the object's class does not fully accept responsibility for its dumb pointers (as is the case with STL iterator objects in C++), then the scope(s) enclosing the object (or the object's smart referrers, if it's on the heap) inherit any remaining responsibilities. A class's comments should always point out any such unusual responsibilities delegated to its clients.

There may be other ways to invalidate an iterator in addition to destroying its immediate referent, depending on the requirements of the specific kind of iterator that you're using. Enclosing scopes are responsible for ensuring that iterators are not invalidated (or at least that invalid iterators aren't dereferenced, if they are legal).

In general, you'll want to localize these responsibilities as much as possible without substantially impacting efficiency. In particular, the entire program must be analyzed in order to determine whether a non-const global dumb pointer can be compromised.

A Dumb Pointer Passed to a Function Should Survive the Call

By default, functions may assume that any dumb pointers (e.g. C++ references) passed to it as arguments remain valid until after the function call returns.

Note that having the pointer valid as a precondition is not necessarily sufficient. For example:

	int main() {
		Foo referent;
		func(referent, auto_ptr(&referent));
	}
	void func(Foo &dumb, auto_ptr smart) {
		{
			auto_ptr temp=smart; // temp owns *smart now
			// *smart deallocated here
		}
		dumb.method(); // Boom! Can't use a deallocated object!
	}

The client is normally required to understand the semantics of the function call well enough to guarantee that actions taken by the function on the client's behalf do not destroy passed dumb pointer referents. If the function is guaranteed to be robust against such a deallocation, that should be documented in the function's comments, such that modifications to the function won't violate the guarantee.

Invalid pointer parameter problems can sometimes be addressed by passing the referent by value instead of pointer. In C++, changing a parameter between reference and value semantics is syntactically transparent, which makes the change inexpensive from a programming standpoint.

A Dumb Pointer Returned from a Function Should be Valid

Given that a function's pointer parameters remain valid, a dumb pointer that is returned from or otherwise stored by a function is normally required to be valid after the function returns. In particular, a function should never leave behind a pointer to an object owned exclusively by its own scope, because such an object is always deallocated when the function returns.

On the other hand, whether a function guarantees valid pointer results when there is an invalid pointer parameter, assuming that the function specifically allows that at all, is entirely up to the function.

Point Out Methods that Dereference a Back Pointer

Back pointers have the fortunate property that their referent is always valid, provided that the chain of ownership from the referent returning to the back pointer is strictly exclusive. However, they also have the unfortunate property that their referent isn't always fully constructed.

In a typical case, a back pointer is initialized in the object's constructor, which is called by the owner's constructor. Since the owner/referent's constructor hasn't yet returned, the back pointer is invalid inside its object's constructor. The same issue crops up during destruction.

Dereferencing a referrer of a partially constructed object from outside of the object's class is actually legal, but it's not a good idea because it implicitly relies on the order of the referent's initialization, which (like any other aspect of the class's implementation) is generally subject to change without notice. Furthermore, the fact that the order of initialization even matters is obscure. If a client really does need to rely on initialization order properties, then those properties should at least be spelled out in the class's comments.

The recommended approach for avoiding this problem is to explicitly call out any methods that are not guaranteed not to dereference the back pointer (for example, with a /*back*/ comment), and avoid calling those routines from any of its constructors or its destructor. The owner's constructor or destructor may call such methods, provided that the owner is in a fully coherent state when that happens. You can compensate for the fact that the owner is invisible during construction and destruction by passing additional arguments to the constructor, and having the owner take responsibility for initiating actions that it is required to undertake on behalf of the object's constructor or destructor.

It is good practice to verify that the back pointer actually points to the object's owner before dereferencing it, at least when assertions are enabled. It is also worth mentioning that the copy constructor of a class with a back pointer generally requires an additional argument to indicate the new owner, and therefore the owner's implicit copy constructor won't work.

Anders Johnson, last modified $Date: 2003/01/08 $

[ Home | Resume | Programming | Engineering Philosophy | Family ]