Re: [ng-spice-devel] ACS


To <ng-spice-devel@ieee.ing.uniroma1.it>
From Al Davis <aldavis@ieee.org>
Date Mon, 30 Oct 2000 10:42:44 -0800
Delivered-To mailing list ng-spice-devel@ieee.ing.uniroma1.it
In-Reply-To <39FD4EC9.C2E69668@analog.com >
Mailing-List contact ng-spice-devel-help@ieee.ing.uniroma1.it; run by ezmlm
References <39FD4EC9.C2E69668@analog.com >
Reply-To ng-spice-devel@ieee.ing.uniroma1.it

On Mon, 30 Oct 2000, Alan Gillespie wrote:
> What kind of matrix code do you use ? 

For more info, see the papers:

A. T. Davis, ``Acceleration of Analog Simulation by Partial LU 
Decomposition'',
 IEEE International Symposium on Circuits and Systems, Atlanta, Georgia,
 May 1996.

A. T. Davis, ``A Vector Approach to Sparse Nodal Admittance Matrices'', 
Proceedings
 of the 30th Midwest Symposium on Circuits and Systems, pp.
 1391-1394, Syracuse, New York, August 1987.

It uses a dense inner core, and as far as I know is optimal for this
type of matrix.  It assumes a combination of bordered-block-diagonal
and bump-and-spike forms, which matches well what you get in circuit
simulation.  For busy circuits, it is comparable in speed to the
Kundert package, and a lot faster than the Quarles package.

Decomposition time is not a major issue in simulation speed, unless
you look deeper.  I believe switching ACS to something like super-LU
would slow down the overall simulation significantly, because of
other issues.


ACS is designed as a multi-rate simulator, with the idea that circuit
blocks can be implicitly solved independently, managed by the sparse
matrix package.  The "bypass" in Spice doesn't accomplish much
because it only works on a single step.  In ACS, when something is
"bypassed", it really is.  Not only does it skip model evaluation, it
skips loading the matrix, matrix decomposition,.....   It is based on
queues, so I guess bypass is not the correct term.  There is more work
to be done in this area, but it already shows benefit.  It doesn't
really do multi-rate, yet, but most of the data structures and code
are there.

The matrix package expects ordering to be done externally.  As of
now, it isn't, but I have an external play version that works, based
on depth first search of the connection list.  It is simple, but has
not been a priority to properly install.  For some of my circuits,
the Markowitz algorithm actually makes ordering worse that completely
manual ordering.



The state of the art is such that trying to optimize one step of the
process without considering everything else will probably not be
worth while.  It is better to put the effort in where the various
steps interact.  Choosing the right combination has big benefits.

Here is another interesting paper:

Dongarra, Gustavson, and Karp.  "Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine", SIAM
Review, 26 (1984), pp 91-112.  

It is all about the subtleties of LU decomposition and their effect
on speed, even though they are all the same in theory.  Doing the
test on a different machine could produce results completely
differrent from what they found, but that is the point.

Here is some run-time info for an operating point analysis of a large
circuit:

Time spent in the various simulation steps:
   advance     0.05 
  evaluate     3.61  <-- biggest time consumer.  
      load     1.25 <-- second biggest
        lu     0.12 <-- is is worth any effort to reduce this more?
      back     0.12 
    review     0.00 
    accept     0.04
    output     0.06 
  overhead     0.41
     total     5.66 
iterations: op=24, dc=0, tran=0, fourier=0, total=24
nodes: user=10001, subckt=0, model=0, total=10001
devices: diodes=15000, mosfets=10000, gates=0, subckts=0
models:  diodes=2, mosfets=2, gates=0, subckts=0
density=0.0%                                                         

This is a large digital NMOS circuit.  NMOS is more prone to
convergence problems than CMOS.  There are 10000 transistors.  The
15000 diodes are the ones inside the MOS models.  Spice gave nonsense
results due to numeric overflow on an early iteration that it never
recovered from.

The point of this data is to show how small the "LU" time is compared
to the total, and how model evaluation dominates.  Future effort
should be aimed at reducing model evaluation time.  But, with the
models getting more complicated, this is getting worse.  The emphasis
needs to be on reducing the number of model evaluations, and reducing
storage requirements, by clever algorithms, and someone else can work
on improving the models.


When I did the original sparse matrix (see the date: 1987) I had a
little demo comparing ACS to Spice (2 -- Spice3 wasn't out yet).  I
found the slowest machine (a PC with a 5 mHz 8088) and ran ACS with a
large linear circuit, and the fastest (a big Sun.  I forgot the model
number, but it had a Motorola CPU.)  For ACS, it would complete while
we were watching.  For Spice, I could go out to lunch and it was
still running when we got back.  Fortunately, the current Spice
doesn't use that sparse matrix package, so the comparison doesn't
apply to what is in common use today.  It was fun while it lasted.

al.

Partial thread listing: