[ Home | Resume | Programming | Engineering Philosophy | Family ]

The Top 5 Mistakes in VLSI

1. Forking
2. Poor Specification
3. Improper Modeling
4. Accelerated Milestones
5. Irreproducibility


I've been working in the VLSI (Very Large Scale Integration semiconductor) business for over a decade now, and I see company after company make the same critical mistakes over and over. Although it is possible to make the same mistakes in the software field, they are more insidious in VLSI because the delay between any decision and experiencing its ramifications is significantly longer. Therefore, mistakes have longer to snow-ball, and it is harder to identify true root causes such that repeating them can be avoided.

1. Forking

The most common serious mistake that VLSI companies make is to begin the next project or projects too soon before the previous project goes into production. This is problematic because there are usually a large number of unpredictable issues that arise late in the design cycle (especially when you cut corners early in the design cycle), and it is foolhardy not to budget for those issues just because you can predict what they will be. If most of the team members have moved on to other projects, resolving those issues will take too long, and you'll miss your market window. By induction, the same thing happens to the following projects as well.

As if that were not bad enough, most VLSI organizations attempt to manage concurrent development by copying the project database of a previous chip and proceeding independently. The is a huge mistake, because instead of leveraging efforts by sharing technology, each project assumes separate ownership of subtly different technologies that achieve essentially the same goal. Furthermore, the overhead associated with tracking and managing defects and other issues across projects makes this approach even less workable.

Everything that is complex and error-prone about managing multiple simultaneous projects is even more complex and error-prone when the projects don't share a common code base. Distinguish between copying and reuse.

2. Poor Specification

Since most development activities flow from specification , it is of paramount importance that specification is performed well. Most VLSI organizations do not devote enough attention to specification, and the result is usually products that are late to market, difficult to use, ineffective, and perpetually difficult to manage.

Ambiguity

One way that a poor specification can cause trouble is by being ambiguous. (A specification is said to be ambiguous if it is not possible to determine definitively whether a given behavior satisfies it.) Ambiguity is deceptively insidious because it is common for different readers of the specification to believe that the specification is unambiguous, but yet each have conflicting interpretations of the specified behavior.

A good way to make a specification ambiguous is not to write it down. Another way to have an ambiguous specification is to make it much longer than necessary, because then it is likely to be self-contradictory, and nobody will be able to understand it anyway. A good way to write unambiguous specifications is to use a standardized programming language to express the specified behavior, relying on natural language only to translate the program domain to the product domain.

Misunderstanding Customer Needs

A specification that is not written with an understanding of the customer's actual need is unlikely to add value in the marketplace. Since your customers are most likely inexpert in the VLSI field, the key to specifying a useful product is not necessarily to build what your customers ask for, but rather to interpret the market demands and extrapolate technology trends.

Inflexibility

Making late, unanticipated specification changes, and failing to adjust to changing goals are equally deadly problems in VLSI. A poor specification can cause trouble by failing to distinguish between things that may change and things that are unlikely to change.

Blaming the downstream processes for being inflexible in the face of change will not help you achieve flexibility, because saying that anything might change is no more helpful than promising that nothing will change. Instead, the specification should be organized such that the kinds of potential changes are anticipated, and such that the constant elements are tolerant of potential changes. Expressing the specification using an object-oriented programming language is especially useful for achieving this goal.

To achieve flexibility, be selectively and explicitly inflexible.

3. Improper Modeling

Improper modeling results in chips that don't work even when simulation says that they should. The amount of time that it takes to isolate such failures is unbounded, so it's really a lot better to avoid them. Improper modeling usually comes in one of the following flavors:

Negligence

Models that neglect potentially important effects are always wrong. Unquantified effects should be estimated to an acceptable statistical confidence, rather than neglected. Some of the most often neglected effects include:

Conflicting Assumptions

Models mustn't rely on unverified assumptions. For example, if the calibrated IR drop verification analysis allows a 1V drop, then the delay models ought not to assume a maximum drop of ½V.

Insufficient Conservativism

In general, models ought not to attempt to predict exact behavior, because exact behavior depends on things that are not exactly knowable. Instead, models ought to guarantee (to a vanishing probability of failure) that the actual behavior will conform within the range of possible behaviors predicted by the model. To the extent that any actual behavior outlies this range, the model is wrong.

Excessive Conservativism

Even if a model is correct, it might make predictions that are so weak as not even to surpass human intuition. Relying on such models results in products that are too egregiously overdesigned to be viable in the marketplace. Even worse, developers may therefore be forced to rely on unguaranteed properties, which defeats the purpose of having models.

4. Accelerated Milestones

Management will often accelerate milestones by declaring victory sooner than appropriate, rather than accelerating the actual arrival of the appropriate time to do so. For example, logic freeze might be declared before all the features are verified. This is counterproductive, because announcing a milestone limits the ability to attack problems at their root cause, which is by far the most efficient means of addressing them. As a rule of thumb, every day too soon that you declare a project phase complete adds two or three days to the next phase of the project.

Milestones should have well-defined quality standards built in.

It also helps if freezes can be deferred until shortly before release. This requires a sincere commitment to flexible specification , robust design , and correct-by-construction methodologies.

5. Irreproducibility

I am constantly amazed by products that die avoidably because they cannot be reproduced, either at all or in a viable fashion, as technology advances. Here are some common causes:

Discarding Upstream Data

Discarding upstream data is like destroying the source code to a program just because it's not needed for shipping the product. It makes the product prohibitively difficult to maintain.

In one extreme case, the fracture data tape was lost, so the product died when a mask was accidentally destroyed. More commonly, an organization might lose its hand-drawn block diagrams and timing charts, leaving it with only incomprehensible transistor-level schematics. Even more common is losing track of the different simulations that need to be run in order to verify that the product is still functional in the face of change.

Editing Generated Data

Editing generated data is usually a mistake, because it is very easy to lose track of the manual changes (assuming that you can even track the fact that such changes exist). Even if the manually modified component of the data can be recovered, it may not be clear how such changes ought to apply to newly-generated data.

Manual Processes

When the development of a product involves a great deal of manual effort, the technology for reproducing it lies to a great extent in the minds of the original developers. This becomes problematic when they move on to greener pastures. Prefer correct-by-construction methodologies.

Insufficient Documentation

To some extent, mistakes that lead to irreproducibility can be offset by thoroughly documenting the internal properties to be maintained, as well as any manual steps taken in developing the original version. Few organizations have the discipline to carry this out.

Anders Johnson, last modified $Date: 2002/05/19 $

[ Home | Resume | Programming | Engineering Philosophy | Family ]