CellML Discussion List

Text archives Help


[cellml-discussion] Proposal: BCP for including external codeinCellML models


Chronological Thread 
  • From: ak.miller at auckland.ac.nz (Andrew Miller)
  • Subject: [cellml-discussion] Proposal: BCP for including external codeinCellML models
  • Date: Sun, 18 Mar 2007 13:25:30 +1200

David Nickerson wrote:
>> ECMAScript is not practical for use in modelling, because it is an
>> interpreted, non-typed language, which necessarily means that it cannot
>> be compiled and will be slower than compiled code.
>>
>
> But CellML is an language for the description and exchange of
> mathematical models. It is not meant to be a one-off wonder describing
> the most efficient and best performing method for executing numerical
> computations.
>
> To turn a CellML model description into something useful for computation
> that description has to be interpreted and compiled into some other
> format suitable for the environment using it...
>
> Surely in the same manner, a standard description of procedural code
> could then be interpreted by any number of applications in whatever
> manner they feel best suits their environment?
>
No, because due to the restriction of CellML to expression, it is much
easier to work with, and this is what makes it declarative. You can
perform a variety of manipulations on declarative expressions, but
procedural code can basically only be run in the way it was written to
run (for example, even working out whether procedural code will ever
terminate, 'The Halting Problem', has been proved to be non-Turing
computable in the general case, and this is likely to be the case for
other types of manipulations too).

Code can often be optimised and compiled, but the features of ECMAScript
preclude many of the optimisations that a C compiler, for example, can make.

For example, objects can have arbitrary properties, and there is no way
to tell at compile-time what set of properties an object will have, or
whether a property is a simple property or a getter. While a C compiler
might take a value from an offset into a structure, ECMAScript code
would end up searching a dictionary of properties on an object.
Therefore, ECMAScript is not a good language if you want to be able to
interpret it in different ways (and for any Turing-complete language,
the ways in which you can interpret it are severely limited).

Remember also CellML models can be used to solve a range of different
problem types (fitting, ODE time course, and so on), but one procedural
code implementation might not be useful for all of them.

My BCP document is intended as a way to maintain as much of the model as
possible in CellML, but simply leave the rest of the model unspecified.
Given the amount of history and development of procedural languages, I
don't think we can hope to 'standardise' anything more in a widely
acceptable way when it comes to procedural languages.
>
>> External code needs to be extensible, and hence outside the scope of the
>> CellML specifications, for several reasons:
>> 1) Performance. Code may need to be written in a way which is specific
>> to a particular platform in order to be able to perform well.
>>
>
> some response as above.
>
Sometime, human intervention is always going to be required to save a
model from unfeasible performance issues. If we take an ideological
approach and try to block this from happening, it will just result in
CellML not being used at all. Instead, it is better to encourage people
to use CellML features whenever possible, but allow external code when
it is not possible.
>
>> 2) Access to existing libraries. There are often extensive libraries and
>> other software packages into which a model needs to be integrated. This
>> could be in practically any language, and so it would be necessary to
>> access to data structures of these libraries to have the model work. I
>> believe that this is the case for much of the CMISS-CellML work (I don't
>> really think that a proposal to re-write CMISS in ECMAScript would be
>> very popular!).
>>
>
> In every case of people using CMISS that I know of, the use of CellML is
> to define model specific mathematical equations for integration into a
> larger model.
In other words, the model consists of parts which can be expressed as
mathematical equations, and parts that cannot be expressed in
mathematical equations (in CMISS). You are proposing that the parts
which cannot be expressed in mathematical equations be written in
ECMAScript.
> I'm not suggesting re-writing CMISS in ECMAScript - rather
> you seem to be suggesting including CMISS in a CellML model?!?
>
The question of which model is included in which is more an artificial
distinction than anything more meaningful. However, there needs to be a
mechanism for data flow from CMISS into the CellML models (otherwise,
CMISS can only set initial conditions, it can't have any time dependent
influence on the model).
> This would hold for most such cases of using existing libraries that I
> can think of, with the exception of someone wanting to solve a
> particular equation or set of equations in a model using a very specific
> numerical method that their CellML simulation tool does not support.
>
There are many other computations that are better done by procedural
code than by systems of ODEs. Machine learning algorithm lookups are one
example of this, and there are extensive libraries of these sorts of
things available.
> Even if you take a step back and look at the larger picture of using
> things like FieldML, CellML, MathModelML (or something), etc... to
> describe something like an electrical propagation model in the heart,
> the tool (eg, CMISS) pulls it all together and plugs fields and
> variables together based on the model annotations. Otherwise you'll end
> up with cell models that say things like "give me the current load at
> this point in space by solving the bidomain model over this geometric
> domain" - making the cell model description useless for any other
> application. What you rather want is a simply a variable in the model
> which is the current load that has an interface of in. Your cell model
> integrator doesn't care where this value comes from, it just knows that
> when the tool calls for the cell model to be integrated that it will
> provide some appropriate value.
>
I firstly note that if you are talking about using component-level
interfaces for this, that is not a feasible approach. I include an
e-mail I sent to Shane and Poul about this earlier below:

"
Shane has proposed that as an alternative to using content MathML to
reference external code, we could use components. However, this appears
to be inconsistent with the way CellML works at the moment, so I don't
think that it could form the basis for defining external functions.

The problem with the approach of defining external components is that
the directionality of variable interfaces in CellML is too weak to
define the actual directionality and order in which mathematics is
evaluated.

This is a good thing, for two reasons:
1) CellML is inherently declarative, not procedural. This means that if
you give an equation defining x in terms of a, b, and c, but due to the
other components in the model, x, a, and b are known, and it becomes
necessary to obtain c, it is perfectly valid for the CellML software to
perform a Newton-Raphson solve (or algebraic manipulations, if it has
the capability) to obtain c. However, if the directionality on
components was strong, CellML processing software would be constrained
to compute components in a certain way, which would in turn limit the
flexibility of each component.

2) It is possible to have more than one mathematical equation in a
single component, and in some cases these might be completely
independent. For example, you might have, in one component:

w = x + a
y = z + b

and in another:

z = w + c

With x the bound variable of integration, and a, b, and c being constant.

This might make sense, because components are generally used to
represent entities in biology, rather than the actual directionality of
mathematical equations. However, it means that you evaluate part of the
first component, then part of the second, and then go back to part of
the first component. This is something you couldn't do if each component
was an external block.

Given that we don't have a one equation per component system, it is also
possible that you want to combine mathematics in MathML with the
external code (perhaps to re-parameterise the function, or something
like that).

Because of this, I am still convinced that defining external operators
using MathML is a better approach than trying to overload the component
system in CellML for a use other than what it was originally intended.
"

Secondly, the "cell model" integrator does need to care where the values
come from, because it is responsible for moving from one time point to
the next, and to do this, it needs to know what values from the current
time point are needed to compute which other values at the current time
point. This is why I have defined an interface which, in a very MathML
natural way, describes the inputs and outputs of the external code,
which is essentially equivalent to what you are talking about above,
except the inputs to the external code must be provided as well.

>
>> 3) Access to specialised hardware. A model could potentially even
>> require that a function is evaluated by some sort of online experimental
>> procedure (perhaps automated probing of a hardware model) for a given
>> set of inputs.
>>
>
> Again, this seems more like a case where you define a mathematical model
> which given some input(s) produces some output(s). The controlling
> software would take the mathematical model definition in CellML and
> connect the appropriate inputs and outputs.
This is exactly why we need a way to describe inputs and outputs, which
is what I describe in the proposal.
> I would really need a
> concrete example of why you would want to describe a mathematical model
> in CellML which requires input from specialised hardware. Surely you
> just define a variable that has an interface of in and annotate it such
> that the controlling software can find it and plug in the appropriate
> value(s)?
>
>
>> 4) Multiple standards, with different communities who favour them. It
>> would not be practical to get everyone involved with CellML to agree on
>> a certain procedural programming language (even deciding on Fortran vs
>> C++ etc... has been a challenge at this institute, and will probably be
>> impossible for the wider CellML community).
>>
>
> As above, you are not performing computations using CellML directly -
> you always turn the model description into something suitable for the
> computational environment in which the model is being used. Thats the
> beauty of CellML - you can turn it into Fortran or C++, depending on
> your personal preference!
>
For CellML, it is irrelevant what language it is translated through,
because it can't call external code anyway. But if we call external
code, that external code can further call other external code. Also,
CellML filled a new niche, while you seem to propose that we tell
everyone which language to use, which is a contentious issue. Also note
that you cannot turn ECMAScript into efficient C++ in general.
> CellML is all about being able to exchange a standard description of a
> mathematical model between potentially very different software
> environments. The whole idea is specifically not specifying the best way
> to compute outputs from the model - which seems to be what you are
> driving at....and the best way to compute outputs from a model is always
> going to be dependent on the target computational environment.
>
Which is why we keep the things that CellML can do well in CellML, while
continuing to not specify how the things that CellML can't do well. That
is why my proposal only provides details of the interface to the
external code, and doesn't try to specify the external code itself.
>
>> As an example, consider my PhD project, where I plan to put machine
>> learning components into CellML models:
>> 1) Performance is likely to be important. If it is too slow, it might
>> not be feasible to do at all.
>> 2) I plan to use existing libraries, in a range of different languages.
>> 3) I also have another (perhaps not as common) gain from specifying the
>> external functions without describing their details: I need to run
>> different code in 'training' and 'simulation' modes, and if I just wrote
>> generic ECMAscript for the simulation case, there would be no simple way
>> to deduce the training case. Because of this, it is probably good to
>> keep the non-algebraic parts of the model completely separate, and leave
>> it up to whoever implements the specific CellML processor.
>>
>
> I'd probably need to see more detailed plans on exactly what you are
> planning on doing before commenting on this. But from what I have seen,
> whenever anyone has wanted to include procedural code directly in a
> CellML model it has always turned out that they are approaching the
> problem from the wrong direction.
>
> Just to re-iterate, CellML is all about exchanging *descriptions* of
> mathematical models - not implementations of computational code.
>
Which argues for specifying how to interface external procedural code,
as in my original proposal, rather than specifying how to exchange the
procedural code, as you have suggested.
>
>> That said, I think we could have multiple levels of degeneracy away from
>> standardised code, where you only go down to the next item if the
>> current one is impossible:
>> 1) Pure CellML.
>>
>
> definitely.
>
>
>> 2) CellML with standardised Turing-complete code support.
>>
>
> I can see why we should provide a mechanism for this, but have yet to
> see an example where it would be useful (other than to get around a
> particular tool's deficiencies).
>
>
>> 3) CellML with external (non-standardised) code.
>>
>
> I still haven't seen a reason why this would ever be required?
>
>
> David.
>
>

Best regards,
Andrew





Archive powered by MHonArc 2.6.18.

Top of page