CellML Discussion List

Text archives Help


[cellml-discussion] Auto-generate HDF5 from CellML?


Chronological Thread 
  • From: jonovik at gmail.com (Jon Olav Vik)
  • Subject: [cellml-discussion] Auto-generate HDF5 from CellML?
  • Date: Wed, 12 Nov 2008 13:38:46 +0000 (UTC)

David Nickerson <david.nickerson at ...> writes:

> > I'm considering HDF5 for my storage needs in simulating a CellML model
> > under
> > multiple parameter scenarios. HDF5 is designed for efficient storage,
> > retrieval, navigation and subsetting of huge data sets [1], with
> > annotation
> > [2]. I plan on storing both raw and post-processed data, so that if I
> > detect
> > problems at a higher level, I can go back and look at details and
> > possibly
re-
> > run those simulations. David Nickerson described a similar approach in an
> > earlier post [3].
> >
> > However, setting up the data structure with annotations for physical
> > units
and
> > such is quite time-consuming. On the other hand, the CellML representation
> > holds all the required information. It would be very helpful to auto-
generate
> > an HDF5 data structure to hold output from simulations of CellML models.
> >
> > Such a tool should be fairly easy to write for someone familiar with both
HDF5
> > and CellML, and would apply to all possible CellML models. I guess it
> > would
be
> > overly restrictive to make an output format part of the CellML metadata
> > specification. However, offering a standard output format would save
> > duplication of effort and make it easier to share simulation results for
> > further visualization and analysis.
> >
> > I'd like the opinions of the CellML regulars, in particular whether
> > anything
> > similar has been discussed previously.
>
> I'm not aware of this coming up for discussion in the past.
>
> I certainly agree that there is little point duplicating data from the
> CellML model, although when using unversioned model documents the link
> between simulation outputs and input CellML models can become quite
> tenuous. If you are building up a large collection of simulation data
> for which you need to maintain a strong link to the input models
> (which I think you do) you'll probably want to look into such issues
> quite a bit. This is something PMR2 will address (I hope), although,
> for use now, revision numbers in a subversion repository would
> probably be sufficient.

Yes.

> As a side note, I am envisioning that in the long term such simulation
> data would ideally be stored using FieldML (http://www.fieldml.org)
> which underneath is likely to provide several options for the high
> performance persistent storage (with HDF5 being one of the options
> that crops up quite frequently). Unfortunately, I'm unsure what sort
> of time frame a fieldML based solution might become available...

To me, HDF5 looks like a fairly round wheel, which I'd be happy to use while
FieldML decides whether to invent its own.

One feature that would often be useful is parallel writing, for instance for
trivially parallel simulation of multiple parameter scenarios, writing
results
for one scenario without blocking output from the others. HDF5 has this
(http://
www.hdfgroup.org/HDF5/PHDF5/) but e.g. the Python interface
(www.pytables.org)
does not yet support it.

> As for generic generation of HDF5 data structures from CellML models,
> I think this would need some thinking :) Is there a generic way to
> define a useful HDF5 data structure for any given CellML model? I'm
> not sure...

To my layman's eyes it seems most CellML models are sets of coupled ordinary
differential equations, for which it should suffice to save state variables
for
each time point, a model identifier and parameter values. (Parameter values
must
be easily searchable, allowing retrieval of results for a given parameter
set.)
Adding results from post-processing should be left to the end user.

Actually, I think writing a HDF5 structure generator could be almost like
just
adding another output format to the code generator.

> Do you imagine a tool which for a given CellML model (or perhaps more
> realistically for a given chunk of CellML simulation metadata) will
> produce essentially a template HDF5 data group with standardized
> structure. Then your simulation tool would grab that data group and
> populate the simulation results.

Yes!

> Or would some kind of simulation data storage and retrieval service
> sitting on top of the CellML API be more what you are after? Then I
> guess that would allow for different underlying persistent data stores
> to be utilised...

Yes, though I personally don't feel the need for this at the moment.

Best regards,
Jon Olav






Archive powered by MHonArc 2.6.18.

Top of page