[cellml-discussion] Concerning the CellML Model Repository
Tommy Yu
tommy.yu at auckland.ac.nz
Tue Jun 26 16:31:02 NZST 2007
Hi,
I thought Andrew's ideas here is worth expanding, and I wrote a page based on that.
http://www.cellml.org/Members/tommy/BaseRepository
Cheers,
Tommy.
Andrew Miller wrote:
> Matt wrote:
>>> - Version/Variant
>>> It already clogged up the system. There is no proper revision control mechanism, what we have now is an ad-hoc emulated system.
>>>
>> I don't think it has clogged the system I just think it has been
>> improperly used both by authors and by the user interface. This is no
>> fault of the authors, there is simply a specification for versioning
>> that is missing. The hope is that subversion applies well to this.
>>
> I think that the versioning system itself is the root of the problem,
> because it is simultaneously too complicated and too limited.
>
> In particular:
> Branching is inherently a hierarchical process with arbitrary depth, in
> the sense that branches can be made from branches to an arbitrary depth.
> However, the variant / version system does not really provide the proper
> tools to deal with this, because it is limited to two levels (variant
> and version) before its utility in tracking what is a derivative of what
> is exhausted.
>
> It is also inadequate because a new model might combine parts of other
> models, especially if it is a 1.1 model, and these parts need to be
> tracked individually.
>
> I think that the solution is to simplify down to a single global version
> number that is common across the repository or the model (like in
> Subversion), and then let either the CellML metadata, or perhaps the
> Subversion copy history, describe the way a model has been derived.
>
> I see the following workflow as being both simpler and more general...
>
> John Doe creates a new model directory which has its primary URL at:
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
>
> John now owns this model and is the only one who can change it. John
> also gets to decide the visibility of different revisions of the model.
>
> John makes several revisions to the model (each of which bumps the
> global revision number). There is a URL by which each historic version
> can be referred to.
>
> John then publishes the model in a journal, referring to it by the
> primary URL (or perhaps a short-form if we want to offer authors the
> option of assigning one). After the paper is accepted by a peer-reviewed
> journal, John updates the metadata on the model. When he commits these
> changes, the repository sees this and creates a new alias, e.g. at:
> http://www.cellml.org/models/citation/doe_2007_1/
>
> John makes some further changes to his model post-publication and
> commits them. However, by some mechanism (perhaps by the change
> metadata?) the repository knows that this is a change which occurred
> post-publication by John.
>
> Mary notices that there was a discrepancy between the model and John's
> published paper (assuming that he didn't reference the CellML model in
> the paper). She creates a new primary URL containing a copy of John's
> published model, at:
> http://www.cellml.org/models/id/281ab697-4607-4fcf-a433-f3ec382fb445/
> She gets John to check this. When John agrees, she updates the metadata
> on her model to indicate that her version is a more correct version of
> John's paper. The repository then updates so that
> http://www.cellml.org/models/citation/doe_2007_1/ is a reference to
> John's fixed version.
>
> John merges in Mary's changes to
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/
> and continues working on more changes. He starts collaborating with
> Mary, so he grants her write access to
> http://www.cellml.org/models/id/0ff280ef-dce6-4a42-a275-c9a7d9699096/.
>
> Ming wants to create a derivative of John's paper, so he creates a copy
> of the revision referenced from
> http://www.cellml.org/models/citation/doe_2007_1/ at
> http://www.cellml.org/models/id/7a8996e1-8d05-4a29-a7d8-622d047804fc/
> and starts working on it (marking up the history in the model metadata).
>
> As you can see, instead of having a confusing mix of variants and
> versions (with versions of variants of versions of variants), having a
> single revision forces us to look at the metadata instead, which then is
> sufficiently general not to have the problems we have seen.
>
>>> - It's CellML Code, right?
>>> Why not put code in a real code management system, like Subversion?
>>>
>> Subversion works well for filesystems of code and text data and to
>> some extent binary data that we don't really need to query the
>> contents of. If this applies well for CellML modelling, then
>> subversion is probably a good match. Subversion will bring its own
>> complexities when we are dealing with applying security to file
>> objects,
> It depends whether or not we actually allow direct access to Subversion
> by untrusted users.
> A simple approach would be to make everyone go through the front-end
> (which might even implement enough methods to let Subversion check out
> from there anyway).
>
> > and security/publishing in general will get even more complex
>> if we are proxying remote repositories - which we talked about a few
>> weeks ago.
>>
>> Generally, I think the concept of cellml modelling being laid out in a
>> filesystem and subversion versioning concepts applied to it is good,
>> but untested. For instance, take a reasonably complex model of Andre's
>> and work out how it will look on the filesystem and what subversion
>> versioning would result in.
>>
> I think Andre already has a layout for his model (with relative URLs).
> Letting the author decide what it looks like is probably a good first step.
>> While in this thread, I don't believe metadata should be treated any
>> differently to model data. Adding special rules for versioning of some
>> data and not others is going to complicate the versioning process and
>> I can't see any compelling reason to do this.
> I agree (for metadata about the model at least. Permissions etc... are a
> special case of course).
>> Remember that the
>> subversion system is versioning file objects which will contain both
>> metadata and cellml model data. What is important is how and where
>> metadata is stored. Perhaps metadata should be seperated into its own
>> document sitting next to the model in the filesystem.
>>
> Model is a confusing word because CellML 1.1 models can combine several
> models to make one mathematical model. There is a case for metadata /
> manifest about the mathematical model as well as metadata about each the
> CellML models that make up the mathematical model.
>> My inclination is that an implementation using subversion plus some
>> subversion hooks will be ok, but we haven't worked out details or done
>> any proof of concept for this - which should be agnositic to cellml
>>
> This would have the benefit of supporting non-CellML models, although it
> means that we have to change the CellML models if we are going to
> include RDF/XML serialisations inside them.
>
> Perhaps a generic framework with some XML with embedded RDF specific
> parts slotted into it would be better.
>
>> and focussed on how to apply zope+cmf security and workflows to data
>> objects stored in subversion repositories.
>>
> If we are going to be doing a major re-write, now is the time to
> consider if we should be using Zope, or if we want to proxy this part of
> the site to some other technology (I think that the decision the first
> time was not discussed at CellML meetings at all, and has had a lot of
> unfortunate consequences, so I don't think it is completely out of the
> question to reconsider technologies. The fact that we are already using
> it probably carries some weight in the decision, but other factors might
> be enough to tip the balance in another direction).
>>
>>> - Zope has revision control
>>> Until someone packs the database.
>>>
>> Perhaps you should look at http://plone.org/products/plone/roadmap/8
>> (which is now completed and merged into Plone 3). There are some other
>> add on products - some listed in
>> http://plone.org/products/by-category/versioning-staging
>>
>>
>>
>>> - Zope/Plone is also quite slow.
>>>
>> Really? How so?
>>
> I think an interpreted language, even a byte-compiled one, will always
> be slow, and all the layers of abstraction from Zope and Plone probably
> make this worse. However, I'm not sure that it is the bottleneck for the
> majority of users given the recent thread about network speeds.
>>
>>> - Code we have now cannot get away from original design flaws. Might as well start from scratch.
>>>
>> Refactoring may achieve the outcome better.
>>
> I agree that this will be better in general (throwing away everything is
> probably a bit drastic, I am sure that there are some parts of the code
> that are still usable). Of course, if we move off Python this might be
> the only option, so we should keep an open mind but be wary of the costs
> of doing so.
>
> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
More information about the cellml-discussion
mailing list