Re: [ng-spice] Support for SMP ??
On Tue, 30 Jan 2001, James Swonger wrote:
> If you're doing looped analyses (like parametric or Monte Carlo
> analyses) then parallelism at the "job" level will help you out and
> this could be script-automated or -assisted.
Given the architecture of Spice, this is the only way that makes
sense. However, there is plenty of room to improve the algorithms.
You should be able to get significant speedup on a single CPU.
Contrast, if you use 2 CPU's in parallel, the most speedup you can
get is 2x.
This won't help in a transient analysis, because each step is used as
the initial condition for the next.
If you're looking to
> speed up individual, linear runs, then I think that you're probably
> out of luck and only faster hardware / more memory will help. The
> analog simulations are inherently serial solutions of large
> matrices and I'm not optimistic that you can get the matrix
> solution to be shared. Even if you could, you would need a lot of
> brain-to-brain bandwidth; throwing the whole matrix back & forth
> over a network would be even nastier.
It's even worse than that. Even if you could, the matrix solution is
only a small piece of the total.
I am not sure what the distribution is in Spice, but here are the
times, broken down by step, for one large run in ACS:
advance 0.05
evaluate 3.61 <-- biggest time consumer.
load 1.25 <-- second biggest
lu 0.12 <-- is is worth any effort to reduce this more?
back 0.12
review 0.00
accept 0.04
output 0.06
overhead 0.41
total 5.66
iterations: op=24, dc=0, tran=0, fourier=0, total=24
nodes: user=10001, subckt=0, model=0, total=10001
devices: diodes=15000, mosfets=10000, gates=0, subckts=0
models: diodes=2, mosfets=2, gates=0, subckts=0
density=0.0%
OK.... .12 seconds out of 5.66, or about 2% of the total. If you
cut the matrix (LU) time in half, you cut less than a tenth of a
second off the run time. Such a minor improvement is not worth the
effort.
This time distribution tells me that the most significant speedup is
probably in the "evaluate" step. This is where models are evaluated.
I think this test used the level 2 model. One possibility is to use
a faster models, such as level 3. Usually the user wants a
particular model, so not much can be done here. Another possibility
is to do fewer evaluations. ACS tries to optimize this, but the
optimization only helps for certain types of circuit. (not this one)
You probably could parallel this step. In Spice, you need to get
into the model code and chop it. To make it effective, you would
need to do all the models. Another possibility is to do some types
of device on one CPU, and other types on the other. In ACS, you
could process the queue from both ends.
I believe that for this type of circuit, the time distribution would
be about the same in Spice.
It is a DC operating point analysis of a string of cascaded N-MOS
inverters, biased at the midpoint. (where you never would bias a
real circuit) There are 10000 transistors, 15000 parasitic diodes.
Spice fails to converge on it. (Numeric overflow.)
The sparse package in Spice (Ken Kundert's) is pretty good. The only
benefit I see in swapping it for something else is for certain
special properties. This is why I use a custom sparse package in
ACS. It will do partial solutions, which enables true bypass, which
makes it possible to use queues or "selective trace" rather than
bypass, which eventually should make true multi-rate possible.
As to parceling out nodes, then reassembling.... I think the
overhead of doing this will be more than the time saved by parallel
solution.
The ACS sparse package finds hinge points where you could break it
into pieces that are processed in parallel. It is part of the
partial LU algorithm. It is simple to scan for them, and it sort of
does anyway. Still, I think the effort is best applied elsewhere.
In conclusion, I think supporting parallel processing is not worth
the effort, except for its educational value. The time is better
applied to improving the algorithms.
al.
Partial thread listing: