CellML Discussion List

Text archives Help


[cellml-discussion] Proposal: BCP for including external codeinCellML models


Chronological Thread 
  • From: matt.halstead at auckland.ac.nz (Matt )
  • Subject: [cellml-discussion] Proposal: BCP for including external codeinCellML models
  • Date: Mon, 19 Mar 2007 08:52:25 +1200

I have often thought referencing external code through a clearly
defined interface would be useful, and mostly because procedural code
is another natural way to solve problems. But I have always banged my
head up against validation. With procedural code this amounts to
passing tests - good tests - and being confident that the code will
break in useful ways when it does break. I don't see this as being any
different to the intended outcome of valid CellML models that are
purely declarative.

At first glance it might seem that it is more taxing for a developer
wanting to use CellML in their application if they need to handle
external code; but this proposal for external code is very specific to
the math declarations, and I think independent of whether the math is
represented in MathML or as an external source of procedural code, the
decisions of an application that are investigating the math are going
to be difficult without sufficient annotation that tries to classify
the math formulations in a way that a machine can filter what it is
capable of and not capable of processing. In some cases I imagine the
application developer would welcome a particular math problem being
already coded in a language that could be compiled an run. If that
thought is continued, then there is a place for a model representation
that has all math represented by external code, with the model
structure being represented in CellML. This would obviously be under
the assumption that some particular decisions for simulation of the
model had been made; it is indeed a different scenario from the pure
declarative model that seeks to explain the mathematical problem at a
higher level and leave it to applications to resolve the simulation
from this.

At the moment we don't actually have a useful way for providing a
cellml model with enough machine readable information for someone to
rerun our model in exactly the same we as we had. By referencing
and/or including external code, we allow the step of exchanging a
model at the simulation level, which is actually not a bad thing if
our goal is to promote collaboration of model building.

I do think there is a possibility that people would abuse this; i.e.
jump straight to binding bits of code here and there together with
CellML; but if we maintain standards and best practice, then it should
be easy to show them up. Also, perhaps we should trust people to
evolve to only resorting to external code if it absolutely is the best
way to solve their problem.

There are a couple of things that we could possibly lose by bringing
in external code:
1) producing human readable equations for publication that accurately
reflect the mathematics in the model. Annotation of the algorithms or
maths in the external code would help, but would not guarantee that
the publication reflected exactly what was encoded in the model.
2) ease of creating machine readable annotation for parts of the
external code that would require it - for example under MIRIAM to bind
each 'component' of a model to the relevant part of a reaction
network. This is where you would be questioning the modeler as to
whether their external code should be broken down and spread across
models. But they may not have control over the external source, or,
perhaps they are exchanging models that have necessarily lumped a lot
of biological concepts into one piece of external code(library)
because it's more efficient to solve that way; we now have a non
MIRIAM compliant model.

I would like to think including linking to external code in the CellML
specification would push us to make a bigger effort on the procedures
for model validation, and get more encouraging involvement of various
modelers sitting out there with code that works; rather than thinking
we will somewhere lose some high level elegance of CellML.

Specific comments (quoted pieces from
http://www.cellml.org/Members/miller/bcp-external-models/ are enclosed
in triple quotes)

"""[CellML] models are very good at describing complete mathematical
models in a format which can be exchanged between model authors and
users. This adds significant value to a model representation, because
third parties can take the model, and use it in their preferred
software packages to reproduce any results the author published."""

Need some clear examples of model types that cannot be expressed in
CellML, i.e. some algorithms that are best (or only) expressible at
the moment in procedural code. I know that various neural network
models and genetic algorithm based learning systems have evolved
mainly from procedural thought. I think we need to really consider
that some problems would be much better understood by model authors if
they are expressed in procedural code.

"""Having part of a model expressed in CellML, and other parts
expressed in some more generic language is still useful, because it
means that the common part of the model can be re-used more easily,
either by providing external code of a different kind, or, where
possible by replacing the external code with MathML."""

If external code can be replaced with MathML, then why wouldn't this
have been in a CellML component in the first place?

I see a pro and con where someone encodes most of a model in an
external code block bound into a single component of a model. The pro
would be that maybe this has helped promote someone actually bothering
to use cellml - as a first step, they simply wrapped their existing
code; in this case it would be up to repository maintainers to
encourage a breakdown of the model. The con of course is that we lose
model structure into the external code, and there is no way we can
automatically extract that. It is therefore effectively hidden until
broken down - if that ever makes sense for the model.

"""It is also hoped that this specification will encourage model
developers to build up libraries of CellML accessible external code,
which can be re-used in a range of CellML models, therefore increasing
the range of modelling techniques available to CellML model
authors."""

I would see an open library of external code being very useful. There
would need to be clear grading of that code, for example validating
that code even compiles(if it needs to) and run on x,y,z platforms.

"""Best practice guidelines for CellML document authors"""

"""1. External code should be used only where a part of a model cannot
be adequately expressed in CellML. External code is often
non-portable, and using it reduces the re-usability of your model, and
so it should only be used when needed."""

yes

"""2. External code should only perform the calculations that CellML
is unable to perform, with the rest of the calculations expressed as
MathML, in the CellML model. This is important, because increasings
the fraction of your model can be more easily re-used by other
modellers. It also means that CellML editing and visualisation
software will allow your model to be edited and visualised better."""

yes and no. I don't think representing in MathML offers any more ease
for re-use unless you are all sharing a prescribed subset of MathML
and agree on the acceptable forms of equations if algebraic
manipulation is limited or not possible.

"""3. Modellers should, where feasible, separate external code into as
many different sub-functions as possible. For example, if you have
external code to compute y1 from x1 and x2, and y2 from x1 and x2, you
should write this as two separate external function applications,
unless there is a compelling reason to do otherwise (such as is the
case if it is much more efficient to compute them together). Doing
this makes it easier to modify the CellML model in the future, and
allows the CellML processing software to determine the order in which
expressions are evaluated, making your model more flexible."""

see above ... the compromise will always be the amount of information
you can extract out of the model for other purposes - for example for
model reuse, for simply visualizing and understanding the makeup of
the model, for publication. It could be compelling enough for people
to produce at least one highly broken down model along with the one
fitted for optimization.

"""4. External code should, by itself, meet [MIRIAM] requirements 1
and 2. This means that the external code should be encoded in a
public, machine-readable format, and it should be valid and
compilable."""

It should meet all the criteria of MIRIAM compliance as part of being
a model on the whole. The test cases are going to be very important I
think in assuring the quality of external code. You might make the
case the external code is wrapped in its own model which itself would
need to be fully MIRIAM compliant. The MIRAM document is a bit weak
around the edges of things like validation and the annotation of
'components' of a model. I think we need to be clear about what
validation is necessary for models that reference external code.

I would still like more clarification of how important MIRIAM is to
this; especially in that I think the requirements of MIRIAM haven't
really been designed with typical procedural code examples in mind. I
don't think MIRIAM can't cope with it.

"""5. The external code should be treated as part of the model. When a
model represented in CellML is published, the external code should be
published alongside it, unless it is part of a generally available
library of external code."""

The latter part worries me a little. Enter license bewilderment. But see 6.

"""6. The definitionURL used on csymbol elements should be a URL under
the control of the author. It is not necessary for there to actually
be a document accessible at the URL, as it is merely intended as a
unique identifier."""

What happens with multiple authors? Will an author always guarantee a
method for creating a URL? I think this problem is related to 5. For
example, if the source code for an external component is submitted to
a repository and becomes licensed according to that, then the URL
should probably be related to that. So I think ultimately the domain
that wants to guarantee that the source is perpetually available
should be the domain that forms the base of the URL.

cheers
Matt

On 3/18/07, Andrew Miller <ak.miller at auckland.ac.nz> wrote:
> David Nickerson wrote:
> >> ECMAScript is not practical for use in modelling, because it is an
> >> interpreted, non-typed language, which necessarily means that it cannot
> >> be compiled and will be slower than compiled code.
> >>
> >
> > But CellML is an language for the description and exchange of
> > mathematical models. It is not meant to be a one-off wonder describing
> > the most efficient and best performing method for executing numerical
> > computations.
> >
> > To turn a CellML model description into something useful for computation
> > that description has to be interpreted and compiled into some other
> > format suitable for the environment using it...
> >
> > Surely in the same manner, a standard description of procedural code
> > could then be interpreted by any number of applications in whatever
> > manner they feel best suits their environment?
> >
> No, because due to the restriction of CellML to expression, it is much
> easier to work with, and this is what makes it declarative. You can
> perform a variety of manipulations on declarative expressions, but
> procedural code can basically only be run in the way it was written to
> run (for example, even working out whether procedural code will ever
> terminate, 'The Halting Problem', has been proved to be non-Turing
> computable in the general case, and this is likely to be the case for
> other types of manipulations too).
>
> Code can often be optimised and compiled, but the features of ECMAScript
> preclude many of the optimisations that a C compiler, for example, can make.
>
> For example, objects can have arbitrary properties, and there is no way
> to tell at compile-time what set of properties an object will have, or
> whether a property is a simple property or a getter. While a C compiler
> might take a value from an offset into a structure, ECMAScript code
> would end up searching a dictionary of properties on an object.
> Therefore, ECMAScript is not a good language if you want to be able to
> interpret it in different ways (and for any Turing-complete language,
> the ways in which you can interpret it are severely limited).
>
> Remember also CellML models can be used to solve a range of different
> problem types (fitting, ODE time course, and so on), but one procedural
> code implementation might not be useful for all of them.
>
> My BCP document is intended as a way to maintain as much of the model as
> possible in CellML, but simply leave the rest of the model unspecified.
> Given the amount of history and development of procedural languages, I
> don't think we can hope to 'standardise' anything more in a widely
> acceptable way when it comes to procedural languages.
> >
> >> External code needs to be extensible, and hence outside the scope of the
> >> CellML specifications, for several reasons:
> >> 1) Performance. Code may need to be written in a way which is specific
> >> to a particular platform in order to be able to perform well.
> >>
> >
> > some response as above.
> >
> Sometime, human intervention is always going to be required to save a
> model from unfeasible performance issues. If we take an ideological
> approach and try to block this from happening, it will just result in
> CellML not being used at all. Instead, it is better to encourage people
> to use CellML features whenever possible, but allow external code when
> it is not possible.
> >
> >> 2) Access to existing libraries. There are often extensive libraries and
> >> other software packages into which a model needs to be integrated. This
> >> could be in practically any language, and so it would be necessary to
> >> access to data structures of these libraries to have the model work. I
> >> believe that this is the case for much of the CMISS-CellML work (I don't
> >> really think that a proposal to re-write CMISS in ECMAScript would be
> >> very popular!).
> >>
> >
> > In every case of people using CMISS that I know of, the use of CellML is
> > to define model specific mathematical equations for integration into a
> > larger model.
> In other words, the model consists of parts which can be expressed as
> mathematical equations, and parts that cannot be expressed in
> mathematical equations (in CMISS). You are proposing that the parts
> which cannot be expressed in mathematical equations be written in
> ECMAScript.
> > I'm not suggesting re-writing CMISS in ECMAScript - rather
> > you seem to be suggesting including CMISS in a CellML model?!?
> >
> The question of which model is included in which is more an artificial
> distinction than anything more meaningful. However, there needs to be a
> mechanism for data flow from CMISS into the CellML models (otherwise,
> CMISS can only set initial conditions, it can't have any time dependent
> influence on the model).
> > This would hold for most such cases of using existing libraries that I
> > can think of, with the exception of someone wanting to solve a
> > particular equation or set of equations in a model using a very specific
> > numerical method that their CellML simulation tool does not support.
> >
> There are many other computations that are better done by procedural
> code than by systems of ODEs. Machine learning algorithm lookups are one
> example of this, and there are extensive libraries of these sorts of
> things available.
> > Even if you take a step back and look at the larger picture of using
> > things like FieldML, CellML, MathModelML (or something), etc... to
> > describe something like an electrical propagation model in the heart,
> > the tool (eg, CMISS) pulls it all together and plugs fields and
> > variables together based on the model annotations. Otherwise you'll end
> > up with cell models that say things like "give me the current load at
> > this point in space by solving the bidomain model over this geometric
> > domain" - making the cell model description useless for any other
> > application. What you rather want is a simply a variable in the model
> > which is the current load that has an interface of in. Your cell model
> > integrator doesn't care where this value comes from, it just knows that
> > when the tool calls for the cell model to be integrated that it will
> > provide some appropriate value.
> >
> I firstly note that if you are talking about using component-level
> interfaces for this, that is not a feasible approach. I include an
> e-mail I sent to Shane and Poul about this earlier below:
>
> "
> Shane has proposed that as an alternative to using content MathML to
> reference external code, we could use components. However, this appears
> to be inconsistent with the way CellML works at the moment, so I don't
> think that it could form the basis for defining external functions.
>
> The problem with the approach of defining external components is that
> the directionality of variable interfaces in CellML is too weak to
> define the actual directionality and order in which mathematics is
> evaluated.
>
> This is a good thing, for two reasons:
> 1) CellML is inherently declarative, not procedural. This means that if
> you give an equation defining x in terms of a, b, and c, but due to the
> other components in the model, x, a, and b are known, and it becomes
> necessary to obtain c, it is perfectly valid for the CellML software to
> perform a Newton-Raphson solve (or algebraic manipulations, if it has
> the capability) to obtain c. However, if the directionality on
> components was strong, CellML processing software would be constrained
> to compute components in a certain way, which would in turn limit the
> flexibility of each component.
>
> 2) It is possible to have more than one mathematical equation in a
> single component, and in some cases these might be completely
> independent. For example, you might have, in one component:
>
> w = x + a
> y = z + b
>
> and in another:
>
> z = w + c
>
> With x the bound variable of integration, and a, b, and c being constant.
>
> This might make sense, because components are generally used to
> represent entities in biology, rather than the actual directionality of
> mathematical equations. However, it means that you evaluate part of the
> first component, then part of the second, and then go back to part of
> the first component. This is something you couldn't do if each component
> was an external block.
>
> Given that we don't have a one equation per component system, it is also
> possible that you want to combine mathematics in MathML with the
> external code (perhaps to re-parameterise the function, or something
> like that).
>
> Because of this, I am still convinced that defining external operators
> using MathML is a better approach than trying to overload the component
> system in CellML for a use other than what it was originally intended.
> "
>
> Secondly, the "cell model" integrator does need to care where the values
> come from, because it is responsible for moving from one time point to
> the next, and to do this, it needs to know what values from the current
> time point are needed to compute which other values at the current time
> point. This is why I have defined an interface which, in a very MathML
> natural way, describes the inputs and outputs of the external code,
> which is essentially equivalent to what you are talking about above,
> except the inputs to the external code must be provided as well.
>
> >
> >> 3) Access to specialised hardware. A model could potentially even
> >> require that a function is evaluated by some sort of online experimental
> >> procedure (perhaps automated probing of a hardware model) for a given
> >> set of inputs.
> >>
> >
> > Again, this seems more like a case where you define a mathematical model
> > which given some input(s) produces some output(s). The controlling
> > software would take the mathematical model definition in CellML and
> > connect the appropriate inputs and outputs.
> This is exactly why we need a way to describe inputs and outputs, which
> is what I describe in the proposal.
> > I would really need a
> > concrete example of why you would want to describe a mathematical model
> > in CellML which requires input from specialised hardware. Surely you
> > just define a variable that has an interface of in and annotate it such
> > that the controlling software can find it and plug in the appropriate
> > value(s)?
> >
> >
> >> 4) Multiple standards, with different communities who favour them. It
> >> would not be practical to get everyone involved with CellML to agree on
> >> a certain procedural programming language (even deciding on Fortran vs
> >> C++ etc... has been a challenge at this institute, and will probably be
> >> impossible for the wider CellML community).
> >>
> >
> > As above, you are not performing computations using CellML directly -
> > you always turn the model description into something suitable for the
> > computational environment in which the model is being used. Thats the
> > beauty of CellML - you can turn it into Fortran or C++, depending on
> > your personal preference!
> >
> For CellML, it is irrelevant what language it is translated through,
> because it can't call external code anyway. But if we call external
> code, that external code can further call other external code. Also,
> CellML filled a new niche, while you seem to propose that we tell
> everyone which language to use, which is a contentious issue. Also note
> that you cannot turn ECMAScript into efficient C++ in general.
> > CellML is all about being able to exchange a standard description of a
> > mathematical model between potentially very different software
> > environments. The whole idea is specifically not specifying the best way
> > to compute outputs from the model - which seems to be what you are
> > driving at....and the best way to compute outputs from a model is always
> > going to be dependent on the target computational environment.
> >
> Which is why we keep the things that CellML can do well in CellML, while
> continuing to not specify how the things that CellML can't do well. That
> is why my proposal only provides details of the interface to the
> external code, and doesn't try to specify the external code itself.
> >
> >> As an example, consider my PhD project, where I plan to put machine
> >> learning components into CellML models:
> >> 1) Performance is likely to be important. If it is too slow, it might
> >> not be feasible to do at all.
> >> 2) I plan to use existing libraries, in a range of different languages.
> >> 3) I also have another (perhaps not as common) gain from specifying the
> >> external functions without describing their details: I need to run
> >> different code in 'training' and 'simulation' modes, and if I just wrote
> >> generic ECMAscript for the simulation case, there would be no simple way
> >> to deduce the training case. Because of this, it is probably good to
> >> keep the non-algebraic parts of the model completely separate, and leave
> >> it up to whoever implements the specific CellML processor.
> >>
> >
> > I'd probably need to see more detailed plans on exactly what you are
> > planning on doing before commenting on this. But from what I have seen,
> > whenever anyone has wanted to include procedural code directly in a
> > CellML model it has always turned out that they are approaching the
> > problem from the wrong direction.
> >
> > Just to re-iterate, CellML is all about exchanging *descriptions* of
> > mathematical models - not implementations of computational code.
> >
> Which argues for specifying how to interface external procedural code,
> as in my original proposal, rather than specifying how to exchange the
> procedural code, as you have suggested.
> >
> >> That said, I think we could have multiple levels of degeneracy away from
> >> standardised code, where you only go down to the next item if the
> >> current one is impossible:
> >> 1) Pure CellML.
> >>
> >
> > definitely.
> >
> >
> >> 2) CellML with standardised Turing-complete code support.
> >>
> >
> > I can see why we should provide a mechanism for this, but have yet to
> > see an example where it would be useful (other than to get around a
> > particular tool's deficiencies).
> >
> >
> >> 3) CellML with external (non-standardised) code.
> >>
> >
> > I still haven't seen a reason why this would ever be required?
> >
> >
> > David.
> >
> >
>
> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>




Archive powered by MHonArc 2.6.18.

Top of page