CellML Discussion List

Text archives Help


[cellml-discussion] New draft of Simulation Metadata Specification available


Chronological Thread 
  • From: ak.miller at auckland.ac.nz (Andrew Miller)
  • Subject: [cellml-discussion] New draft of Simulation Metadata Specification available
  • Date: Fri, 18 Aug 2006 15:52:39 +1200

David Nickerson wrote:
>>> I guess it just makes sense to
>>> leave it as it is and assume the values are specified in the same units
>>> as the bound variable itself. But that assumption should be added to the
>>> specification.
>>>
>>>
>> Okay, I will take this approach in the specification for now, and we can
>> update it later if the consensus is that we need something better.
>>
>
> yeah, thinking about it some more that seems the best approach for now.
>
>
>>> For the specification of the numerical algorithms used in a simulation,
>>> I think we need to be clearer on the implications of each of the
>>> allowable options. i.e., you need to be sure that in any given
>>> implementation you will get comparable results for the same choices,
>>> that my implementation of gear-1 uses the same method as your
>>> implementation of gear-1.
>>>
>> It is very hard to get the exact same results from any two non-identical
>> solver implementations, even if they try to implement the same
>> algorithm, due to numerical instability issues and so on (e.g.
>> differences in the precision of intermediate results and constants built
>> into the program, different solver libraries giving slightly different
>> results, and so on). In some cases, the results may be comparable, and
>> in other cases they may not be, so it would be very hard to guarantee
>> this. I agree that gear-1 (I meant using Gear's BDF method, with M=1,
>> rather than a particular implementation of Gear's algorithm) is a bit
>> ambiguous and we need some more prose describing what is meant by each
>> method.
>>
>> I guess the best we could do is give a reference to papers (we could do
>> this in every RDF document, but that would probably result in excessive
>> space use, since most simulations would use the same limit set of solvers).
>>
>
> I would hope that if a model author were to be specific enough to say
> use the runge-kutta-prince-dormand-8-estimate-9 multistep method with
> this maximum bound variable step size then you should be getting the
> exact same answer out of any implementation that supports that very
> specific algorithm (to within some machine precision error). Otherwise
> there would seem little point in being so specific about the method to
> use. Maybe you could expect differences if a given implementation used
> single instead of double (or higher) precision, so maybe we need to
> include that in the specification? We're already saying that all the
> typed literals are generally of type xsd:double so it doesn't seem too
> bad to be adding that runge-kutta-prince-dormand-8-estimate-9 is
> expected to be a implementation of that method using at least double
> precision floating point arithmetic.
>
Its not clear that any implementation would support several floating
point representations at the same time, so users might not get anything
out of this. However, providing such information could allow tools to
warn the user if the representation differs (and also if the tool has to
fall back on another integrator). If we did choose to do this, just
saying "64 bit" isn't enough, however. We should probably explicitly
state "IEEE 754-1985 double precision". The XML Schema specification
already does this for xsd:double and xsd:float, so we could re-use XML
Schema datatypes, and allow users to define their own if new data types
arise.

However, just the way you write your code makes a big difference. On x86
platforms:
double x = (y * z) * a;
can produce different results from
double x = y * z;
// Some other intervening calculations...
x *= a;

because the FPU variables used in the middle of calculations are 128
bit, but double is 64 bit. While porting RADAU from FORTRAN to C for
mozCellML, I found that some minor changes like this produced huge
differences in final results for some types of calculations.
> Especially once we start getting into model curation it is going to be
> very important to know exactly what methods were used to validate the
> model. And again, I would hope that such a detailed specification of the
> method should be enough, without also having to specify the exact
> implementation used as well.
>
The problem is that even with quite specific methods, there are still
often some implementation decisions left on the developer of the
integrator (and quoting a paper is still not necessarily sufficient,
because authors will may leave out details which they believe are
obvious, leaving scope for different implementations).

That said, the differences between the implementations are are often
minor, and it is likely that they will only show up in certain
pathological cases.
> Probably the thing to do is to provide the appropriate references for
> each of the methods in the specification (or maybe an appendix?) as well
> as test models and results that people can use to test/validate their
> implementation. Maybe its not such a bad thing to have in the
> specification that runge-kutta-prince-dormand-8-estimate-9 refers to the
> implementation of this method in the GNU scientific library and the BDF
> method refers to the implementation in the SUNDIALS CVODE
>
I was planning on letting gear-1 and gear-2 be implemented using GSL
too. It is not clear if one of the Gear methods in GSL produces
equivalent results to the BDF from SUNDIALS CVODE (I guess they may have
made their own improvements to the method, and I believe the GSL Gear
methods attempt to mimic the original GEAR FORTRAN code more closely).
Does this mean we need to provide a cvode-bdf in addition to gear-1 and
gear-2?

To allow progressive fallback, we would probably encourage multiple
levels of integrator information to be provided:
Level 1: References the exact integrator code that was used, including
bugfixes that occur between versions. In addition to meeting the
requirements for the lower levels, definitions for this level must
provide a reference to the exact source code used.
Level 2: References the integrator code that was used. Different
versions (differing only due to bugfixes) will still match at level 2.
In addition to meeting the requirements for the lower levels,
definitions for this level must provide a reference to the integrator
package used.
Level 3: References the integration algorithm used. This must refer to a
specific algorithm. Minor deviations between implementations (for
example, differences in the exact way matrix algebra is performed, or
the details of the FPU used). In addition to meeting the requirements
for the lower levels, definitions for this level must provide a
reference to a paper which describes a specific algorithm. An example
might include Runge-Kutta Prince Dormand (8th order with a 9th order
error estimate).
Level 4: A reference to the general algorithmic framework used. In
addition to meeting the requirements for the lower levels, definitions
for this level must describe a set of requirements which are common
within the framework. An example might be "Variable Order Embedded Runge
Kutta".
Level 5: References to the type of problem which the solver should be
able to solve. Examples: stiff and nonstiff.

The higher the level of match, the stronger the guarantee that the
results will be the same.
> implementation, if we need to be that precise to guarantee
> quantitatively similar results for different applications with the same
> solver options ???
>
>
>>> And maybe there also needs to be some generic
>>> options that allow a model author to hint at the requirements of their
>>> model/simulation. For example, cs:linearSolver has an option for direct
>>> but not a generic iterative (for various reasons I have found that some
>>> models under some circumstances simply run faster with an iterative
>>> solver than with a direct solver).
>>>
>> Does it make sense to say "give me an iterative solver" without saying
>> which one, though? This is the reason I intentionally left that off. If
>> there is consensus that it is useful, it could be put back.
>>
>
> It may be a useful option to be able to include as a fallback solver
> rather than letting an implementation fall back on the default direct
> solver. More as a method for saying don't even try to use a direct
> solver for my model. (for example, you might know that under certain
> conditions your model is going to result in a very nearly singular
> matrix that will never be feasible for a direct solver.)
>
Okay. If we adopt the multi-level framework above, this could be a level
5 description of the integrator.

The current approach of putting text strings in the RDF is actually
quite ugly, and it doesn't leverage the RDF very effectively. Perhaps we
should be referencing resources (which in turn reference papers), and
providing a standard set of resources predefined at a http://cellml.org/
URL? We could then add in new modules without changing the original
specification.

e.g. we could reference
http://www.cellml.org/specifications/metadata/simulation/standardvalues_gsl.rdf#Runge_Kutta_Prince_Dormand_8_9

as our level 2 integrator, and
http://www.cellml.org/specifications/metadata/simulation/standardvalues.rdf#Runge_Kutta_Prince_Dormand_8_9

as our level 3 integrator. Those RDF files would provide metadata
describing the simulations (although we would encourage packages to
recognise the resource rather than continuously load the RDF and read
the metadata).
>
>>> Or for cs:multistepMethod maybe there
>>> should be options for explicit, implicit, stiff etc. as well as being
>>> able to specify specific methods. We might also want to consider the
>>> ability to specify alternative methods, e.g., an author knows that
>>> gear-2 gives the best results for their model but if an implementation
>>> does not provide that option it would be good to know that it will fall
>>> back on a stiff method rather than simple Euler stepping.
>>>
>>>
>> We could do this using a collection (in which order is important) to
>> describe successively less preferable choices of integration method. We
>> could then allow entries in this list which specify a class of
>> integrators, rather than a specific one.
>>
>
> I think that would be very useful.
>
>
>>> Is there also a case for specifying the minimum set of numerical
>>> algorithms an implementation has to support?
>>>
>> It probably depends what type of implementation we are talking about. It
>> is conceivable that some implementations exist solely to allow users to
>> use a new algorithm, and such implementations would then only support
>> that algorithm.
>>
>>> It would be good to be able
>>> to have a model/simulation that you know will run and produce
>>> quantitatively the same result in any compliant implementation.
>>>
>>>
>> I'm not sure this is feasible, even if the implementations are very
>> similar, there is a chance that, at least for some models, very
>> different results will be produced.
>>
>
> I think that if an implementation is falling back on the generic options
> then you might expect some models to produce different results. But if
> an implementation supports the specific method the model author
> specifies then you should get the same results.
>
> But you're right, its not worth specifying the minimum set.
>
Although we could suggest that implementations keep the user informed
about the matching level, so they know whether or not they are likely to
reproduce the results the author got.
>
>> For the first release of PCEnv, I plan to either require a canonical
>> form (preferably the one that happens to come from 4Suite, since the
>> Model repository uses that), or use the Mozilla RDF library.
>>
>
> Will be good if you can let us know the canonical form that will
> initially be used in the model repository.
The easiest way to find out would be to put some RDF through 4Suite,
since the RDF will be read in and rewritten by that library, regardless
of what input format we use (and it could change significantly based on
minor changes like those discussed here, so I wouldn't recommend doing
that until the specification is more stable. Even then, its possible
that minor differences in metadata input could cause the software to
write a different form, so it may not be that canonical).

> Does anyone happen to know
> any good C/C++ compatible RDF tools that could be used in a standalone
> application outside of mozilla and the model repository?
>
I have looked into this before, but I have never actually got to using
one. A Freshmeat.net search (try
http://freshmeat.net/search/?q=RDF&section=projects&Go.x=0&Go.y=0)
brings up lots of options you could consider.

>
>>> I'll look at updating my implementation to use this new version of the
>>> specification and see what other issues come up.
>>>
>>>
>> Although be warned that it is still a draft could change again.
>>
>
> yep - but its always good to try things out to make sure they make as
> much sense in practice as they do in discussion :-)
>
> An issue I forgot to mention yesterday was that the specification
> includes a way to specify the iteration method to be used with the
> linear solver. For what I have been doing, the specification of the
> linear solver to use decides the iteration method that will be used for
> the linear solver (i.e., all my iterative solvers are krylov-subspace)
> and the iteration method I specify applies to the multistep method. The
> options available are functional and Newton iteration - where functional
> is only suitable for non-stiff systems and requires no linear solution
> while Newton requires a linear solver.
>
Correct me if I am wrong, but isn't Krylov-subspace is a functional
iteration method? I changed functional from your specification into
Krylov-subspace in order to be more specific. Obviously, we need to
disambiguate between the iteration method used in the iterative solvers,
and the iteration method used inside the stepping method.
> So I think there should either be two iteration method predicates or the
> linear solver choices need to include the iteration method. For example,
> instead of generalised-minimum-residual you have GMRES (Newton
> iteration) and SPGMR (krylov-subspace iteration). Probably the first
> option is better and we could have cs:iterationMethod refer to the
> multistep method iteration and then add a cs:linearSolverIterationMethod
> for the linear solver iteration method.
>
I think the first option sounds the best. Of course, it is likely that
not all combinations will be supported by all software packages, even if
each individual value is, so it makes the fallback algorithm more
complex, and I guess we need to documented a recommended fallback method.

Best regards,
Andrew





Archive powered by MHonArc 2.6.18.

Top of page