CellML Discussion List

Text archives Help


[cellml-discussion] CellML API: Liveness results in poor performance


Chronological Thread 
  • From: ak.miller at auckland.ac.nz (Andrew Miller)
  • Subject: [cellml-discussion] CellML API: Liveness results in poor performance
  • Date: Wed, 31 May 2006 13:31:17 +1200

Hi all,

All attributes/operations on the current CellML API are live (meaning
that, should the model be changed, they will return updated values).
Liveness is achieved, for the most part, by computing things when they
are requested, and never caching anything (although iterators also
require DOM events to work).

This approach is great for uses like in the CellML editor, where the
model is constantly changing, and everyone should be using the latest
data. However, in other contexts, such as the CCGS, it creates
significant performance problems, as it means that the same computations
need to be performed again and again. As more complex parts of the
CellML API are built on simpler parts, even within CellML API calls,
there can be some unnecessary recomputation.

In terms of the CCGS, the biggest problem is the sourceVariable
attribute (which returns the variable with no in interfaces, connected
to this variable. In order to do this, it has to check all connections,
decide whether the connections connect the component it is working on to
another component (which can require checking imports to support CellML
1.1), and then check for variable matches. It has to repeat this in a
DFS pattern so that indirectly connected variables are considered). In a
model from the repository with 82 variables, this took about 3,838,000
x86 operations per call (and the algorithm it uses is reasonable given
the liveness requirement). In order to generate code, the call had to be
called 1228 times, taking 92% of the total runtime. While this may not
be that bad if you have a GHz range CPU and are only processing a few
models, it makes testing in Valgrind hard, and bigger models will
increase this in a superlinear fashion.

There are several possible solutions:
1) Don't use the CellML API for expensive operations, and instead use a
more efficient, but non-live, algorithm. In the case of source variable
determination, we could easily do it in O(|V| |C|) time, where V is the
set of variables, and C the set of connections, by performing a DFS
search starting from each source variable, and even faster if we
annotated components with what is connected to them.

Advantages:
a) Best Performance.
b) Not a huge amount of effort required, as optimisation only needs to
be done where problems are evident.
Disadvantages:
a) Doesn't encourage code reuse.
b) Doesn't follow our CellML API, which we are trying to encourage
others to use.

2) Use the CellML API, but cache the results. This is still suboptimal,
as work is being repeated in the CellML API calls, but a significant
improvement, e.g. 3838000 * 82 instead of 3838000 * 1128.

Advantages:
a) Significant, but not ideal, performance improvements.
b) The easiest solution.
c) Still uses the CellML API, but builds a layer on top of it.
Disadvantages:
a) Still not very efficient.

3) Provide a separate API for performing operations like this. The
separate API would be initialised once, and changes to the model would
not be reflected into the results from the new API.
Advantages:
a) Can use good algorithms, and get good performance.
b) General solution.
Disadvantages:
a) Waste memory storing things which are never needed (or which are only
needed once, depending on implementation).
b) Lots of work required to do it generally.

4) Modify current API to cache and invalidate using DOM events.
Advantages:
a) Can improve performance significantly (better than 2, as the CellML
API uses itself).
Disadvantages:
a) Waste memory caching things which are only needed once.
b) Writing would be very slow, as DOM events are quite complex to
dispatch, so setting them everywhere would be very expensive. Therefore,
this would solve the above problems but likely introduce different
performance problems.
c) Lots of work to implement well.

I personally would prefer 2, or 1 in cases where 2 isn't efficient enough.

Opinions?

Best regards,
Andrew Miller




  • [cellml-discussion] CellML API: Liveness results in poor performance, Andrew Miller, 05/31/2006

Archive powered by MHonArc 2.6.18.

Top of page