CellML Discussion List

Text archives Help


[cellml-discussion] Auto-generate HDF5 from CellML?


Chronological Thread 
  • From: david.nickerson at gmail.com (David Nickerson)
  • Subject: [cellml-discussion] Auto-generate HDF5 from CellML?
  • Date: Sat, 15 Nov 2008 16:49:07 +0800

On Sat, Nov 15, 2008 at 3:55 PM, Jon Olav Vik <jonovik at gmail.com> wrote:
> Alan Garny <alan.garny at ...> writes:
>> > I think the searching should be performed with the CellML models
>> > rather than the HDF5 data files. The CellML models containing the
>> > various parametrizations of full models will contain a lot more useful
>> > data to query. You can then search your model archive for simulation
>> > descriptions (model instantiations) that meet certain criteria and can
>> > then retrieve the corresponding simulation data from the HDF5 data
>> > file. I'm currently not in favour of duplicating parameter values in
>> > the HDF5 data groups...but not sure about this yet :)
>>
>> Duplication of information in any form or shape should be prohibited in my
>> view. That's a recipe for disaster at some point down the line.
>
> Yes, but there needs to be *some* way of identifying the information in the
> HDF5 file, like "using parameter values as indexes". A purist solution
> might be
> to have each simulation result annotated with the URI for that particular
> parameter set and model. However, any analysis would then require running
> back
> and forth between the CellML model (DOM API, metadata, ...) and the huge
> output
> files (e.g. HDF5). Until the CellML tools (DOM, code generation, ...) fit
> seamlessly into more mainstream tools, I'd prefer not to lug around the
> CellML
> DOM API everywhere I take my data. (No offense. 8-)

but doesn't this get back to the issue of needing all the information
like units in the HDF5 data file also? If you'd prefer to have an HDF5
data file that can be unambiguously interpreted without reference to
the source CellML models and/or simulations, then that is a whole lot
more data that would need to be in the data file....

if you want to interpret the data in the file without needing to go
back and forth with the CellML models then I'd guess you probably want
to add some tool-specific data to the HDF5 group that gets generated
by the proposed tool/service...or not. Maybe the below has convinced
me that this could be done in a nice way...

> I was thinking of this extra annotation as "write once, read many", just
> labelling the boxes. There exist external tools for exploring HDF5 files,
> http://www.hdfgroup.org/hdf-java-html/hdfview/
> and these will be a lot less useful if the data structure doesn't indicate
> which parameters a result is for. (That said, it might be useful to verify
> the
> integrity of the link between model, parameters and output e.g. by some
> kind of
> hashing.)

This sounds more like you are after a complete translation of the
source models and simulations into HDF5. For a given model you'd have
a list of all the "unique" variables in the model annotated with a
string containing the full expansion of the variable's units into the
set of base units, and the variable's value field - which would be a
scalar for constant parameters and an array for dynamic variables. I
guess you'd also want some kind of reference to the index field (i.e.,
time). Not sure if you'd also want to keep track of all the actual
variables in the model that are used for each of the unique variables
in the simulation instantiation, but that could be done.

In such a tool you'd still lose a lot of the annotation in the source
CellML models. But I guess if you simply want an optimised data store
the above should give you everything you need and if required in
special cases you can also link back to the CellML models as there
should still be some URI's stored somewhere in the HDF5 data file. Of
course, if you want to do all this nice and quickly you'd likely
ignore the units anyway if you know that all your simulations are in
compatible or identical units so they can be left back in the CellML
model and can be looked up if needed.

One consideration with such a solution is that I have found the HDF5
packet table interface to be about the most efficient way to stream
simulation data to a persistent store. I have one packet table per
simulation and use the model variable URI's to set up a mapping into
that packet table for each dynamic variable. So rather than using the
variable field of dynamic variables for an array, it is probably more
efficient to set it up as an index or something into the packet
table....sounds like it should be workable :)


Andre.




Archive powered by MHonArc 2.6.18.

Top of page