[ Home | Resume | Programming | Engineering Philosophy | Family ]

On Concurrent Development

Much attention has been given to the importance of leveraging one's technology base to achieve short product cycles through concurrent development. However, practical advice in this matter is hard to come by, and in my experience few organizations have been able to accomplish it successfully. I believe that the failure of most such development efforts are rooted in underestimating the subtle complexity of the problem, and accordingly addressing it with oversimplified methodologies that are not equipped to deal with the issues that arise.

On the bright side, though, the problem is not so complex as to be hopeless. In fact, there are existing methodologies and technologies that can achieve efficient and successful concurrent development when used judiciously and appropriately.

Concurrent Contributors

The problem of dealing with multiple concurrent contributors to a single project with well-defined goals and a single release event is actually a relatively simple one, and yet many organizations still get it wrong.

Shared File Tree

The simplest approach is to have multiple writers to files in shared locations. With this approach, there is no way to track changes. Reverting changes is either nontrivial or impossible, depending on how rigorously the system is archived (e.g. backed up to tape). Worse yet, if two writers edit the same file simultaneously, then the changes of the first to write are lost.

These problems are alleviated by using a locking methodology, such as RCS. Still, the possibility of deadlock and problem of checking in incompatible changes remain. Furthermore, the fact that files are still updated asynchronously makes it essentially impossible to achieve a self-consistent state without temporarily forbidding concurrent changes. Thus, incremental progress cannot be verified without impacting the rate of development significantly. For these reasons, this approach is still unsuitable for a project with many contributors and nontrivial interfaces.

Static Isolation

A better approach is to use static isolation with merging, as with CVS or P4. Each contributor checks out a snapshot of the file tree and makes and verifies his changes without affecting any other contributor. When changes are completed, they are checked in. If another contributor has checked in changes since the snapshot was checked out, then those changes are merged in and re-verified in coexistence with the pending changes before they are checked in.

Merging may require manual intervention, but when changes are independent merging is almost always automatic. Planning the work to be done such that most interdependent changes are not concurrent is recommended. (Interdependent changes are not to be confused with distinct components of a single change that may be submitted by multiple contributors, which should normally occur concurrently.) It is also recommended that each contributor have a separate tree for every unrelated change, such that each change can be submitted separately when it reaches maturity.

The fact that checking in before others means having to do less merging provides an incentive to complete changes rapidly, which tends to boost productivity. However, it can also lead to low quality submissions unless there are agreed-upon submission criteria.

Having a uniform minimum check-in regression for all submitters is a good idea, because it minimizes the probability that incorrect changes are ever checked in. (Being forced to incorporate somebody else's bugs into your changes is generally counterproductive.) The regression itself should be under source control, as it normally becomes more thorough as release approaches. Triggering the regression automatically may be advisable if contributors lack the discipline to run it voluntarily.

While static isolation solves many problems associated with coordinating development, it should not supplant out-of-band communication (i.e. talking and documenting). In particular, conflicting assumptions about project goals or conventions are still guaranteed to cause trouble.

Concurrent Projects

Having multiple concurrent projects with different release dates and potentially conflicting goals poses a much more difficult challenge. Almost nobody gets this right on the first try. Static isolation among contributors is still recommended, but it is no longer sufficient to avoid problems.

Copy and Run

The simplest approach to this problem is to base development of a new project on the snapshot of another project that is the closest to the goal of the new project. The new project then forks off and proceeds independently. This approach always falls victim to at least one of the following problems: The ultimate result is that most of the projects fail, and the projects that do succeed proceed almost completely independently, which defeats the whole purpose of technology reuse, and probably nullifies the advantage of having those projects consolidated into a single organization.

Unified Code Base

A more effective approach to managing multiple concurrent projects is to share a unified code base. Differences among projects are accounted for by using a preprocessor, inheritance, templates, superset code or a combination of such mechanisms, and are triggered by some trivial mechanism such as a command line option or an environment variable. Files that are profoundly dissimilar are simply not shared, for example, by using a project-dependent search path. The uniform minimum regression contains tests for every project under development.

Rather than maintaining previous releases as constant entities (which is supported by the source control system anyway), the shared portions of the tree are allowed to evolve over time, provided that a configuration satisfying the goals of each previous release is maintained. This permits global improvement without violating local requirements. (Obsoleting previous releases outright is to be avoided, because they tend to remain relevant in the marketplace long after developers have lost interest in them.)

One clear advantage of a unified code base is that if maintaining unification turns out to be unmanageable for some reason, it is still easy to fork the code base at any time. On the other hand, once you take the copy and run approach, it rapidly becomes prohibitively difficult to remerge the code base.

While the unified code base approach has been observed to succeed, it still has a number of issues:

Branching

The drawbacks to using a unified code base for concurrent projects are largely addressed by the judicious use of branches. The main purpose of branching is to provide temporary isolation among projects to permit local progress on individual projects while global issues are being detected and resolved. A secondary purpose is to permanently isolate project-dependent release patches from the unified code base.

Project Branches

Every project lives on its own branch off the trunk. Every branch has its own regression, which normally covers only the configuration of the current project. The trunk regression includes tests for every project. Changes propagate from a project branch into the trunk, and subsequently back down into each of the other project branches. Propagated changes that cause any regression to fail are flagged as issues to be resolved.

Contributors must still remain mindful of the impact of changes on all projects, but partitioning the regressions minimizes the risk of undetected errors without imposing enormous check-in latencies. To further minimize this risk, an intermediate branch between the trunk and the project branch with a very thorough regression may be added. However, this increases the latency of integration.

When project branches are first instituted, the task of change propagation and merging is likely to be neglected. However, provided that it is impressed upon contributors that merging is not optional, the fact that the first to propagate changes has the advantage of easier merging will provide ample incentive to propagate early and often.

Development Branches

When multiple contributors perform mutually required components of a given change, a development branch is created off the project branch. This branch is used as a place-holder to integrate those components until the result is expected to pass the project regression. Development branches typically lack a regression of their own.

A development branch can also be used to track the progress of a nontrivial change performed by a single contributor.

Release Branches

When release is too imminent to justify the risk of incorporating changes from another project, but changes to the current project are ongoing, a release branch is used. Delaying the propagation of changes is a bad idea, because it adds risk to the contributing project. However, the release branch can isolate those changes from the impending release of the recipient project. The subset of the changes in the release branch that are applicable to the unified code base must be manually propagated to the project branch.

Release branches should be created only on the basis of risk. Changes that are rejected on the basis of suitability should instead be altered such that the recipient project is unaffected.

If the "right" way and the "safe" way to address a defect conflict, then the safe way belongs in the release branch, and the right way belongs in the trunk.

Pitfalls

The most common pitfalls with branching are to fail to account for what has already been merged, and to make changes to a project branch that are obviously unsuitable for the unified code base. Such mistakes tend to preclude merging, which defeats the advantages of a unified code base, so don't do that. If you're not up to the challenges of branching, then you're not up to the challenges of concurrent project development.

References

Anders Johnson, last modified $Date: 2002/02/05 $

[ Home | Resume | Programming | Engineering Philosophy | Family ]