[cellml-discussion] Concerning the CellML Model Repository
Andrew Miller
ak.miller at auckland.ac.nz
Fri Jun 22 10:50:27 NZST 2007
Tommy Yu wrote:
> Hi,
>
> I have written down some of my thoughts on how the model repository could be put together.
>
> http://www.cellml.org/Members/tommy/repository_redesign.html
>
> It is still a pretty rough document. The usage example section gives a rough outline on what I see people might be doing with the repository and how this design could address those issues, which I think it will be of interest to users. It is not an exhaustive list, yet.
>
> I must also note the design outlined is quite a drastic departure from what we have now (it will be yet another new repository). However, it is more true to the one envisioned before according to http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition layer that will assist in pulling content and drawing relationships between models.
>
> Feel free to take it apart and/or build on top of it.
>
Hi Tommy,
A few comments:
1) I am still not convinced that meta-data should not be versioned,
simply because changes to metadata can be important changes to a model.
In some cases, such as changes to simulation metadata, the changes might
have a major impact on the final model.
I don't think it is a bad thing to have a one-way cache of metadata
somewhere for technical / performance reasons (perhaps in a relational
database), but I think that we should replicate data for each model
(perhaps using a deep copy-on-write approach if this is really necessary
to save disk space) rather than changing the metadata for existing
models without changing the version.
Making changes to metadata require changes to the model will ensure that
no one gets burned by referencing a particular version of a model, only
to find that the metadata in that version has changed on them.
Your current unversioned, globally shared metadata approach probably
also has security implications. For example, lets say that Alice submits
a model which references a publication. Now suppose that Charlie wasn't
an author of that paper, but he wants to add his name onto the list of
authors. So he submits a completely different, bogus, model which
includes metadata for the publication, and includes his name. When Bob
downloads Alice's model from the repository, it would then include
Charlie's name as one of the authors (assuming that the publication was
referenced by PubMed ID or DOI or some sort of publication URI.
Particular cases like the one I described might be able to be secured in
an ad hoc fashion such as by checking that the authors are the same, but
the general attack will still pervade this type of approach unless
metadata is associated uniquely with a particular version of a
particular model. If the assertions about the same subject cannot be
identified between models in the database, then having data flow back
from the relational database into the model does not carry any benefit
at all).
However, I do agree that there is a place for some metadata which can be
changed without creating a new version (which probably is the type of
metadata that you wouldn't include in the CellML file by default).
Curation status and permissions would probably fit in this category,
because although they may be associated with a particular version, they
should not be immutable for a given version.
2) I think that there should be a directory for each mathematical model
(which may include several CellML model files, documentation, and so
on), so that a particular version can be downloaded / checked out in its
entirety (with some directory-level manifest describing how to run or
view the model). This suggests that collisions between mathematical
models should be prevented at this level, not at the file level. Under
this scheme, Mary would find that at usage example 3, she couldn't use
the same directory name as the one John already submitted.
3) I think the 'reference by citation' needs some expansion: I think
people referencing models should have the choice to refer to:
=> a specific version for which no files will change at all.
=> the latest version which aims to reflect the letter of a publication
(updates will only fix mistakes in the model which prevent it from
corresponding to the printed paper).
=> the latest version which aims to reflect the results obtained by the
author (updates can fix discrepancies or omissions from the paper that
were in the author's original code, if the author didn't use CellML).
=> the latest derivative of the current model developed by the same
author / group, even if it has not yet been peer-reviewed (subject to
permissions constraints).
=> the latest derivative of the current model, but with all imports
external to the model updated to the latest versions (even if this has
not been reviewed by the author). This would be the most frequently
updated version, because it could be automatically created without the
model author being involved.
It would also be possible to search for derivatives made by other authors.
4) I'm not sure that the keywords based URIs are strictly necessary.
Perhaps search functionality which links to models is enough for this
(which avoids a whole set URI stability issues)?
Best regards,
Andrew
> Cheers,
> Tommy.
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>
More information about the cellml-discussion
mailing list