New data file format


To "'ng-spice-devel@ieee.ing.uniroma1.it'" <ng-spice-devel@ieee.ing.uniroma1.it>
From "Gillespie, Alan" <Alan.Gillespie@analog.com>
Date Thu, 7 Dec 2000 14:47:29 -0000
Delivered-To mailing list ng-spice-devel@ieee.ing.uniroma1.it
Mailing-List contact ng-spice-devel-help@ieee.ing.uniroma1.it; run by ezmlm
Reply-To ng-spice-devel@ieee.ing.uniroma1.it


As I mentioned before, there's a change I'd like to make
to the ng-spice rawfile format. I think that the basic
binary rawfile format is as good a place to start as any,
but we should include a bit more data about the simulation.
For instance, I don't think Spice3f4 marks the data file with
a temperature. Other things, like the run statistics, might
be useful too.

I think the standard binary format outputs double precision
data, which is overkill in my opinion, although I know some
people disagree. But single precision halves the file size,
and therefore doubles the access speed. Single precision has
plenty resolution, way more than any reasonable reltol
setting. At least, the precision could be optional.

But it's access speed that can be dramatically improved with
a wee trick which I'll try to explain :-

The usual way to output spice results is simply to write the
data for each timepoint immediately after it's been calculated.
So, for example, if the circuit consists of 1024 nodes, then
the data for each timepoint will be 1024 floating point values.
If the data is saved in single precision, then each time point
will need 4K of data. If the simulation is a long transient,
then you could easily have hundreds of thousands of timepoints.
So 204800 * 4K = 800MByte file, where each "curve" consists of
204800 separate bytes spaced at 4K intervals through the whole
file. Now, to read one "curve" back from the file, the computer
must read the whole file. This is because, firstly, the disk is
broken into 512 byte sectors, and the minimum that a read
operation can read, is one sector. Further, in win32 anyway, I
think the disk cache reads data in 4K chunks (with up to 64K of
read-ahead), and so for every data point that the viewer program
requests, the computer will read in at least a 4K chunk, enough
data for all the curves at each timepoint. Another way of looking
at it is that the machine must fulfil 204800 disk reads to
retrieve the complete data for one curve.

If we rearrange the data into chunks of, say 128 timepoints,
so that the data is written in batches of 128 timepoints, with
each "curve" written as 128*4=512byte blocks of data, then the
the previous example would result in a file of curves, each of
which are stored in 1600 512byte blocks spread through the file.
Since each disk read will still collect 4K of data, we're still
wasting most of the data we read in (more on that later), but
now we're only reading in 8 "curves-worth" of data for each
curve, rather than the whole 1024 curves. This can lead to a
dramatic speed improvement, since the disk only has to fulfill
1600 disk reads, instead of 204800 the "old" way.

The extra data may not be totally wasted, though. Since both
Linux and win32 use all free physical memory as disk cache,
then if the whole 800meg file is read for every curve requested,
then at the end of each curve read, the cache will contain the
last N megabytes of the file, which is of absolutely no use if
you want to read another curve in (unless you've got more memory
than simulation data, but we'll assume that you haven't).

With the "new" data structure, each curve read would read in
6.4MBytes, of data (in this example). This will be less than most
disk caches, and so if you're lucky enough to choose to read in a
curve from one of the other 7 curves which were read at the last
pass, then the computer doesn't have to go back to disk for the
next curve. If you've got, say 64Mbytes of disk cache, then
you'll could up with 80 full "curves-worth" of data stored in
the disk cache.

I did a wee test with this structure, and was able to scan a
500MByte file for 1 "curves-worth" of data in under 3 seconds
on my laptop. I can't remember the exact timings, but it was
more than a 10x speed improvement. Actually, I do know
that I get just over 10MBytes/second disk transfer rate under
windows, and about 8MBytes/second under Linux, so we'd be
talking nearly 20x speedup.

Obviously, there the exact format needs to cater for the last
batch of timepoints from each simulation, since that's unlikely
to be exactly 128 points. Each block would need a header saying
how many timepoints it contains.

Does that make sense ?

Cheers,

Alan

Partial thread listing: