CellML Discussion List

Text archives Help


[cellml-discussion] Biological and other non-model citations in CellML metadata?


Chronological Thread 
  • From: lenov at ebi.ac.uk (Nicolas Le Novere)
  • Subject: [cellml-discussion] Biological and other non-model citations in CellML metadata?
  • Date: Thu, 29 Mar 2007 10:45:31 +0100 (BST)

On Thu, 29 Mar 2007, Matt wrote:

> Can you explain in more detail or point to explanations of
> bqmodel:isDescribedBy?

You can find some explanations at:

http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers

Note tha qualifiers are optional to be MIRIAM-compliant. I personaly
think we should always use some qualification, otherwise an annotation
becomes very difficult to use except for jumping from webpage to
webpage.

> Specifically:
> - what is its intended meaning?

Cf above. Note that the list of qualifiers is by no mean frozen. We
are already aware of several gaps (e.g. how do-we qualify the relation
between a peptide and the gene that encodes it?)

> - when more than one of these is defined on a resource, how is this
> interpreted? For example: is there some precedence implied somehow?

This is up to the "tool" using the qualifiers. SBML does not allow
nested qualifications. There is only an implicit "hasVersion" if several
identical qualifiers are present:

bqmodel:isDescribedBy toto
bqmodel:isDescribedBy tata

means is described by toto and is described by tata. In other words
toto or tata describe the component.

NOT toto and tata are necessary to describe the component.

On top of that, BioModels DB add some precedence
http://www.ebi.ac.uk/compneur-srv/biomodels/doc/annotation.html

But all that is not part of MIRIAM rules.

> - how do you determine the kind of reference it is - for example a
> pubmed uri? You have a datatype for vocab/database IDs in the
> annotation scheme you described, but I don't see this in the
> bqmodel:isDescribedBy examples.

<rdf:li rdf:resource="http://www.pubmed.gov/#8983160"/>

http://www.pubmed.gov/ means "the following identifier has to be
interpreted as pointing to a data of PubMed".

http://www.pubmed.gov/ is unique and should not normally
change. However, sometimes it may neverstheless change for various
reasons: URI too confusing, badly choose, fusion of two resources
etc. For instance, the old PubMed URI was
http://www.ncbi.nlm.nih.gov/PubMed/
It was misleading because tied to a particular physical resource at
the NCBI.

We have a deprecation system in place that allow to resolve the
old URIs and provide the new ones.


> - how would you address auxiliary references as opposed to primary
> references so that a machine interpreting it can make the distinction?

I am not sure I understand that. Like primary and secondary accessions of
UniProt?

>
> <snip>
>>
>> I entirely agree with Melanie, people should be able to pick the
>> resource they want, as far as they uniquely identify it. This is
>> clearly described in the MIRIAM paper.
>
> I'm not sure what benefits one gains from letting people arbitrarily
> choose what they want to use to identify something with. For example,
> how to you work out if particular entities in one SBML model match
> entities in another SBML model?
>
> Also, given that most of these resources are controlled vocabularies,
> there is a lot of room for misunderstanding someone's intention when
> using their choices of identifiers.
>
>
>
>> An annotation is formed of
>> three parts:
>>
>> The data-type, e.g. PubMed entry, DOI, GO term, Cell-type ontology term ...
>>
>> The identifier of the particular information, e.g. 123456789, GO:0001234
>> ...
>>
>> An optional qualifier that describe the relationship between the concept
>> represented by the model component and the concept represented by the
>> particular information.
>>
>> To help people implement that, we developed MIRIAM resources
>> (http://www.ebi.ac.uk/compneur-srv/miriam/).
>>
>> If you download a model from BioModels DB in SBML (not in CellML at
>> the moment, for obvious reasons highlighted by the current
>> discussion), you will see something like:
>>
>> <bqmodel:isDescribedBy>
>> <rdf:Bag>
>> <rdf:li rdf:resource="http://www.pubmed.gov/#8983160"/>
>> </rdf:Bag>
>> </bqmodel:isDescribedBy>
>>
>> But on the webpage, there is:
>>
>> b>Publication ID:</b>&nbsp;<a
>> href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8983160";
>> target="_blank">8983160</a>
>>
>> The URL is dynamically generated by MIRIAM webservices. I fact in the
>> new version of BioModels DB, to be released in the fall, the URL does
>> not point to PubMed anymore, but to the EBI extended Medline, more
>> comprehensive. BUT the URI stored in the model is still the SAME.
>>
>> Similarly for a DOI:
>>
>> <bqmodel:isDescribedBy>
>> <rdf:Bag>
>> <rdf:li rdf:resource="http://www.doi.org/#10.1063/1.1681288"/>
>> </rdf:Bag>
>> </bqmodel:isDescribedBy>
>>
>> is transformed in:
>>
>> b>Publication ID:</b>&nbsp;<a href="http://dx.doi.org/10.1063/1.1681288";
>> target="_blank">10.1063/1.1681288...</a>
>>
>> That system is very flexible. You can use any resource listed in
>> MIRIAM resources, and this resource can be extended at will (note that
>> we distribute XML version of the resource for local use). But it is
>> still robust and expressive.
>>
>> Cheers,
>>
>> On Wed, 28 Mar 2007, Melanie Nelson wrote:
>>
>>> Wow, I haven't posted to this list in a long time...
>>> But I feel compelled to give a little advice as
>>> someone who's spent a lot of time integrating
>>> biological information and therefore has made a lot of
>>> mistakes!
>>>
>>> By all means, have a best practice encouraging people
>>> to use the GO cellular_component ontology to describe
>>> organelles and cells. You could probably also use the
>>> molecular_function ontology for proteins (although
>>> this will be messier). However, neither is likely to
>>> be a complete, i.e., there will be models that
>>> reference a biological entity not in the GO
>>> ontologies. Also, there will be cases where the entity
>>> the model references is most properly thought of as
>>> related in some way (e.g., a subset, a superset, or a
>>> "sibling") to the GO entity. You can spend ages
>>> sorting this sort of thing out and coming up with
>>> consistent rules for handling all the relationships.
>>>
>>>
>>> Since you aren't really interested in sorting out this
>>> biological mess, you may want to consider letting
>>> people choose their own ontology and just reference
>>> it.
>>> An example of this practice is in the MIAME project:
>>> http://www.mged.org/Workgroups/MIAME/miame_1.1.html
>>>
>>> About the citations- my memory of this is fuzzy, but I
>>> think the original intent was that people should
>>> provide the PubMed ID where possible. However, not all
>>> journals are indexed in PubMed (for instance, there is
>>> a CellML paper published in one that is not), so the
>>> model needs to handle full citation info, too. The BQS
>>> model handles both, and then some, which is why we
>>> chose it.
>>>
>>> Hope this is helpful,
>>> Melanie
>>>
>>>
>>> --- Andrew Miller <ak.miller at auckland.ac.nz> wrote:
>>>
>>>> Matt wrote:
>>>>> I don't think this is a good idea.
>>>>>
>>>>> - I think bioentity should be depreciated, it has
>>>> not intrinsic semantic value.
>>>>>
>>>> It does, unfortunately, seem to usually target a
>>>> literal node at the
>>>> moment. It would be nice for this to at least be a
>>>> resource, which could
>>>> provide further information about the biological
>>>> entity (or if we decide
>>>> not to do that, at least a resource, with a
>>>> dictionary and a process for
>>>> adding new words to the dictionary to avoid
>>>> duplication).
>>>>
>>>> It seems that GO(Gene Ontology) has terms for cell
>>>> types, biological
>>>> compartments, and so on, which would offer a better
>>>> way to provide this
>>>> information.
>>>>
>>>> I still think that this metadata is useful, even if
>>>> the automated
>>>> interpretation of it is currently difficult.
>>>>> - If it is used currently, it should be left as
>>>> its current minimum
>>>>> specification which is to label and point to other
>>>> bioinformatics
>>>>> database IDs.
>>>>>
>>>> There are three layers of information here:
>>>> Layer 1: What biological entity are we describing?
>>>> (could be answered
>>>> with a GO term).
>>>> Layer 2: What information about that biological
>>>> entity are we using?
>>>> (could be answered with a reference to a paper, and
>>>> perhaps even a
>>>> reference to raw experimental data).
>>>> Layer 3: How was that information translated into a
>>>> model (could be
>>>> answered with a reference to a paper on the model).
>>>>
>>>> Layer 3 is clearly information about the model, and
>>>> should be described
>>>> by as an arc of the model resource.
>>>> Layer 1 is described by a literal at the moment.
>>>>
>>>> Layer 2 is therefore a gap, which we don't have any
>>>> proper way to
>>>> represent now.
>>>>> - The problem is not 'biologically related
>>>> paper's' per se, but one of
>>>>> identifying what was the primary publication or
>>>> publications that
>>>>> motivated a model.
>>>>>
>>>> The publication which motivated the expression of a
>>>> model in CellML, or
>>>> the publication which motivated the creation of the
>>>> model? Most of the
>>>> models in the repository were motivated by a paper
>>>> about a model which
>>>> was not initially expressed in CellML. However, the
>>>> way that the
>>>> metadata specification works now is that the paper
>>>> which describes the
>>>> model (not the paper which motivated it) is
>>>> referenced from the
>>>> information about the model (not information about
>>>> the CellML file).
>>>>> - There is also the case where a single
>>>> publication that contains a
>>>>> mathematical model is the one and only primary
>>>> source for the model
>>>>> itself - a rather common case at the moment.
>>>>>
>>>> This is what most models in CellML should aim to
>>>> attain. Models can be
>>>> submitted prior to publication as a model, but the
>>>> step of going from
>>>> the biology to a model is something which does need
>>>> peer review.
>>>>> I would prefer that the primary publication(s) be
>>>> identified as such,
>>>>> which covers the case in where there are some
>>>> models in the repository
>>>>> built from general review papers of biology with
>>>> no math.
>>>>>
>>>> If a model is built in that way, it should reference
>>>> the review papers
>>>> as information about the biology, and the author
>>>> should ideally submit
>>>> it for publication, at which point the reference to
>>>> the paper could be
>>>> filled in.
>>>>> I would prefer references to other related
>>>> publications to be bound
>>>>> explicitly to a comment in the model metadata -
>>>> there should be a
>>>>> reason identified by the author/editor/reviewer as
>>>> to why there has
>>>>> been such an association made.
>>>>>
>>>> The problem with this is that the comment is not
>>>> machine readable, so
>>>> there is then no way to get aggregate statistics on
>>>> why models are
>>>> linked. There is also a potential for significant
>>>> duplication of
>>>> information, as opposed to a set of standardised
>>>> predicate terms for
>>>> linking to a set of models.
>>>>> As an aside, we also need to determine whether the
>>>> bqs schema provides
>>>>> enough detail to match publications across
>>>> metadata instances for
>>>>> different models, and whether we should be
>>>> complimenting bibliographic
>>>>> data with pubmed Ids and the like.
>>>>>
>>>> I think that the PUBMED ID is always useful, because
>>>> it allows CellML
>>>> processing software (e.g. the repository) to link
>>>> directly to the Entrez
>>>> / PUBMED page. We could build links based on
>>>> searches for authors and
>>>> titles, but a unique ID is much cleaner. It seems
>>>> that many repository
>>>> models do have PUBMED IDs on them.
>>>>
>>>> Best regards,
>>>> Andrew
>>>>
>>>> _______________________________________________
>>>> cellml-discussion mailing list
>>>> cellml-discussion at cellml.org
>>>>
>>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>>>
>>>
>>>
>>>
>>>
>>> ____________________________________________________________________________________
>>> Bored stiff? Loosen up...
>>> Download and play hundreds of games for free on Yahoo! Games.
>>> http://games.yahoo.com/games/front
>>> _______________________________________________
>>> cellml-discussion mailing list
>>> cellml-discussion at cellml.org
>>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>>
>>
>> --
>> Nicolas LE NOVERE, Computational Neurobiology,
>> EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
>> Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
>> http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: nlenovere at hotmail.com
>> _______________________________________________
>> cellml-discussion mailing list
>> cellml-discussion at cellml.org
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>

--
Nicolas LE NOVERE, Computational Neurobiology,
EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: nlenovere at hotmail.com




Archive powered by MHonArc 2.6.18.

Top of page