Re: [ng-spice] Support for SMP ??
Well, I'm sitting here watching the marching window crawl on 3
jobsright now; circuit is a new PWM design, a few op amps, few
comparators, maybe a thousand MOS in digital gates (ahdl modeled),
oscillator, etc. About 2500 elements in netlist using ahdl, over
4000 if logic is netlisted as transistors.
This is spectreS, Cadence. With basically 100% of an UltraSparc
(50% of a two-processor machine), evaluation takes about 3 minutes
but the jobs run for anywhere between 30 minutes and 12 hours.
A lot of chop and a lot of linear operation, transient runs take
a while. In fact, I routinely overrun Cadence's time counter
(32767 seconds, then wrap to negative reported time).
Big transient runs with "Save all" (all port currents, all node voltages)
are a good test of both memory manager and transient
solution speed.
Suppose you ran this big inverter string with a pulse source for,
say, 100-1000 clock periods. What is the partitioning of the run
time expended then? Also, put enough nodal capacitance in for the
clock frequency that the transient solution never "settles" and
the timestep never gets to uprange.
Just curious about how ACS (presuming it had bipolars) and/or
ngspice would do on my kind of designs. As it stands now, I have
SPICE decks that contain too many idiomatic/unsupported model
params to run on other engines. But I'd like to see how far you
guys can push the nodes*models*timesteps envelope and come out
with usable data.
>From: Al Davis <aldavis@ieee.org>
>It's even worse than that. Even if you could, the matrix solution is
>only a small piece of the total.
>
>I am not sure what the distribution is in Spice, but here are the
>times, broken down by step, for one large run in ACS:
>
> advance 0.05
> evaluate 3.61 <-- biggest time consumer.
> load 1.25 <-- second biggest
> lu 0.12 <-- is is worth any effort to reduce this more?
> back 0.12
> review 0.00
> accept 0.04
> output 0.06
> overhead 0.41
> total 5.66
>iterations: op=24, dc=0, tran=0, fourier=0, total=24
>nodes: user=10001, subckt=0, model=0, total=10001
>devices: diodes=15000, mosfets=10000, gates=0, subckts=0
>models: diodes=2, mosfets=2, gates=0, subckts=0
>density=0.0%
>
>OK.... .12 seconds out of 5.66, or about 2% of the total. If you
>cut the matrix (LU) time in half, you cut less than a tenth of a
>second off the run time. Such a minor improvement is not worth the
>effort.
>
>This time distribution tells me that the most significant speedup is
>probably in the "evaluate" step. This is where models are evaluated.
> I think this test used the level 2 model. One possibility is to use
>a faster models, such as level 3. Usually the user wants a
>particular model, so not much can be done here. Another possibility
>is to do fewer evaluations. ACS tries to optimize this, but the
>optimization only helps for certain types of circuit. (not this one)
> You probably could parallel this step. In Spice, you need to get
>into the model code and chop it. To make it effective, you would
>need to do all the models. Another possibility is to do some types
>of device on one CPU, and other types on the other. In ACS, you
>could process the queue from both ends.
>
>I believe that for this type of circuit, the time distribution would
>be about the same in Spice.
>
>It is a DC operating point analysis of a string of cascaded N-MOS
>inverters, biased at the midpoint. (where you never would bias a
>real circuit) There are 10000 transistors, 15000 parasitic diodes.
>Spice fails to converge on it. (Numeric overflow.)
>
>The sparse package in Spice (Ken Kundert's) is pretty good. The only
>benefit I see in swapping it for something else is for certain
>special properties. This is why I use a custom sparse package in
>ACS. It will do partial solutions, which enables true bypass, which
>makes it possible to use queues or "selective trace" rather than
>bypass, which eventually should make true multi-rate possible.
>
>As to parceling out nodes, then reassembling.... I think the
>overhead of doing this will be more than the time saved by parallel
>solution.
>
>The ACS sparse package finds hinge points where you could break it
>into pieces that are processed in parallel. It is part of the
>partial LU algorithm. It is simple to scan for them, and it sort of
>does anyway. Still, I think the effort is best applied elsewhere.
>
>In conclusion, I think supporting parallel processing is not worth
>the effort, except for its educational value. The time is better
>applied to improving the algorithms.
>
>al.
>
>
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
Partial thread listing: