[
Home |
Resume |
Programming |
Engineering Philosophy |
Family
]
The Top 5 Mistakes in VLSI
1. Forking
2. Poor Specification
3. Improper Modeling
4. Accelerated Milestones
5. Irreproducibility
I've been working in the VLSI (Very Large Scale Integration semiconductor)
business for over a decade now, and I see company after company make the
same critical mistakes over and over.
Although it is possible to make the same mistakes in the software field,
they are more insidious in VLSI because the delay between any decision
and experiencing its ramifications is significantly longer.
Therefore, mistakes have longer to snow-ball, and it is harder to identify
true root causes such that repeating them can be avoided.
1. Forking
The most common serious mistake that VLSI companies make is to begin
the next project or projects too soon before the previous project goes
into production.
This is problematic because there are usually a large number of unpredictable
issues that arise late in the design cycle (especially when you cut
corners early in the design cycle), and it is foolhardy not to budget
for those issues just because you can predict what they will be.
If most of the team members have moved on to other projects, resolving
those issues will take too long, and you'll miss your market window.
By induction, the same thing happens to the following projects as well.
As if that were not bad enough, most VLSI organizations attempt to manage
concurrent development by copying the project
database of a previous chip and proceeding independently.
The is a huge mistake, because instead of leveraging efforts by sharing
technology, each project assumes separate ownership of subtly different
technologies that achieve essentially the same goal.
Furthermore, the overhead associated with tracking and managing defects
and other issues across projects makes this approach even less workable.
|
Everything that is complex and error-prone about managing multiple
simultaneous projects is even more complex and error-prone when the
projects don't share a common code base.
Distinguish between copying and reuse.
|
2. Poor Specification
Since most development activities flow from
specification , it is of paramount
importance that specification is performed well.
Most VLSI organizations do not devote enough attention to specification,
and the result is usually products that are late to market, difficult to
use, ineffective, and perpetually difficult to manage.
Ambiguity
One way that a poor specification can cause trouble is by being ambiguous.
(A specification is said to be ambiguous if it is not possible to determine
definitively whether a given behavior satisfies it.)
Ambiguity is deceptively insidious because it is common for different readers
of the specification to believe that the specification is unambiguous,
but yet each have conflicting interpretations of the specified behavior.
A good way to make a specification ambiguous is not to write it down.
Another way to have an ambiguous specification is to make it much longer
than necessary, because then it is likely to be self-contradictory, and
nobody will be able to understand it anyway.
A good way to write unambiguous specifications is to use a standardized
programming language to express the specified behavior, relying on natural
language only to translate the program domain to the product domain.
Misunderstanding Customer Needs
A specification that is not written with an understanding of the customer's
actual need is unlikely to add value in the marketplace.
Since your customers are most likely inexpert in the VLSI field, the key
to specifying a useful product is not necessarily to build what your
customers ask for, but rather to interpret the market demands and extrapolate
technology trends.
Inflexibility
Making late, unanticipated specification changes, and failing to
adjust to changing goals are equally deadly problems in VLSI.
A poor specification can cause trouble by failing to distinguish
between things that may change and things that are unlikely to change.
Blaming the downstream processes for being inflexible in the face of
change will not help you achieve flexibility, because saying that anything
might change is no more helpful than promising that nothing will change.
Instead, the specification should be organized such that the kinds
of potential changes are anticipated, and such that the constant elements
are tolerant of potential changes.
Expressing the specification using an object-oriented programming
language is especially useful for achieving this goal.
|
To achieve flexibility, be selectively and explicitly inflexible.
|
3. Improper Modeling
Improper modeling results in chips that don't work even when simulation
says that they should.
The amount of time that it takes to isolate such failures is unbounded,
so it's really a lot better to avoid them.
Improper modeling usually comes in one of the following flavors:
Negligence
Models that neglect potentially important effects are always wrong.
Unquantified effects should be estimated to an acceptable statistical
confidence, rather than neglected.
Some of the most often neglected effects include:
- IR drop in the power supply lines
- Capacitive coupling to adjacent, possibly dynamic, interconnect
- Thermal drop between the junction and the case
- Noise in the voltage supply, from both on- and off-chip sources
- Offset between matched pairs of transistors or delays
Conflicting Assumptions
Models mustn't rely on unverified assumptions.
For example, if the calibrated IR drop verification analysis allows a
1V drop, then the delay models ought not to assume a maximum
drop of ½V.
Insufficient Conservativism
In general, models ought not to attempt to predict exact behavior, because
exact behavior depends on things that are not exactly knowable.
Instead, models ought to guarantee (to a vanishing probability of failure)
that the actual behavior will conform within the range of possible behaviors
predicted by the model.
To the extent that any actual behavior outlies this range, the model is wrong.
Excessive Conservativism
Even if a model is correct, it might make predictions that are so weak as not
even to surpass human intuition.
Relying on such models results in products that are too egregiously
overdesigned to be viable in the marketplace.
Even worse, developers may therefore be forced to rely on unguaranteed
properties, which defeats the purpose of having models.
4. Accelerated Milestones
Management will often accelerate milestones by declaring victory sooner
than appropriate, rather than accelerating the actual arrival of the
appropriate time to do so.
For example, logic freeze might be declared before all the features are
verified.
This is counterproductive, because announcing a milestone limits
the ability to attack problems at their root cause, which is by far the
most efficient means of addressing them.
As a rule of thumb, every day too soon that you declare a project phase
complete adds two or three days to the next phase of the project.
|
Milestones should have well-defined quality standards built in.
|
It also helps if freezes can be deferred until shortly before release.
This requires a sincere commitment to
flexible specification ,
robust design ,
and correct-by-construction methodologies.
5. Irreproducibility
I am constantly amazed by products that die avoidably because they
cannot be reproduced, either at all or in a viable fashion, as technology
advances.
Here are some common causes:
Discarding Upstream Data
Discarding upstream data is like destroying the source code to a program
just because it's not needed for shipping the product.
It makes the product prohibitively difficult to maintain.
In one extreme case, the fracture data tape was lost, so the product died
when a mask was accidentally destroyed.
More commonly, an organization might lose its hand-drawn block diagrams
and timing charts, leaving it with only incomprehensible transistor-level
schematics.
Even more common is losing track of the different simulations that need
to be run in order to verify that the product is still functional in the
face of change.
Editing Generated Data
Editing generated data is usually a mistake, because it is very easy to
lose track of the manual changes (assuming that you can even track the
fact that such changes exist).
Even if the manually modified component of the data can be recovered, it
may not be clear how such changes ought to apply to newly-generated data.
Manual Processes
When the development of a product involves a great deal of manual effort,
the technology for reproducing it lies to a great extent in the minds of the
original developers.
This becomes problematic when they move on to greener pastures.
Prefer correct-by-construction methodologies.
Insufficient Documentation
To some extent, mistakes that lead to irreproducibility can be offset by
thoroughly documenting the internal properties to be maintained, as well
as any manual steps taken in developing the original version.
Few organizations have the discipline to carry this out.
Anders Johnson, last modified
$Date: 2002/05/19 $
[
Home |
Resume |
Programming |
Engineering Philosophy |
Family
]