CellML Discussion List

Text archives Help


[cellml-discussion] Biological and other non-model citations in CellML metadata?


Chronological Thread 
  • From: matt.halstead at auckland.ac.nz (Matt )
  • Subject: [cellml-discussion] Biological and other non-model citations in CellML metadata?
  • Date: Tue, 3 Apr 2007 10:02:42 +1200

On 4/1/07, Nicolas Le Novere <lenov at ebi.ac.uk> wrote:
>
> >> > I misunderstand the scope of the property isDescribedBy. I also don't
> >> > think reverse engineering URIs to obtain meaning is a good practice.
> >>
> >> But ... you do not reverse engineer anything.
> >
> > Though you have to pull apart the URI correctly to discover the key.
>
> You mean to split urn:MyURI:12345 into urn:MyURI and 12345?
> (I voluntarily use URN form rather than URL to avoid confusion)
>
> Yes we may have to so that in some cases.


Isn't this what you have to do every time? E.g.:

x bqmodel:isDescribedBy http://www.pubmed.gov/#8983160

or the more general case:

one of:
$X $QUALIFIER [$DATATYPE-1#$IDENTIFIER-1,
$DATATYPE-2#$IDENTIFIER-2,
... $DATATYPE-k#$IDENTIFIER-n
... $DATATYPE-K#$IDENTIFIER-N]

$X $QUALIFIER [urn:$DATATYPE-1':$IDENTIFIER-1,
urn:$DATATYPE-2':$IDENTIFIER-2,
... urn:$DATATYPE-k':$IDENTIFIER-n
... urn:$DATATYPE-K':$IDENTIFIER-N]

where
$DATATYPE-k represents datatype k in URL form
and $DATATYPE-k' represents datatype k in URN form
and $IDENTIFIER-n can be any string allowed in URIs
and $X is a model constituent
and $QUALIFIER is the qualifier property of the annotation of constituent $X

You will always need to pull apart the 'URI' (table2 MIRIAM document)
to retrieve the datatype and identifier.

I guess I'm not sure why it isn't easier to keep the meaning of
datatype and identifier seperate within the language context you are
using - which is basically RDF. So that instead you would have
something like:

$X $QUALIFIER $DATATYPEINSTANCE

$DATATYPEINSTANCE isA $DATATYPE
$DATATYPEINSTANCE hasIdentifier $IDENTIFIER
$DATATYPEINSTANCE hasPhysicalUrl $URL

$QUALIFIER could be as general as isDescribedBy, or as specific as,
for example, isDescribedByPubMedRecord

$QUALIFIER would have domain and range constraints.

I see the end result been the same, but the latter method is easier to
extend and specialise. It also remains within the RDF standard which I
think the MIRIAM document should have focused on more rather than
inventing a very specific non-standard way of representing identifers
and datatypes.

>
> >> The URI IS the meaning. In
> >> the English dictionary, there is a word "publication", with a
> >> definition.
> >> Well, in MIRIAM dictionary, this word is "http://www.pubmed.gov/";
> >
> > So you say somewhere in the dictionary that there is a set of things
> > that are Publications and this set is denoted by any URI that starts
> > with http://www.pubmed.gov/ ?
>
> No. "Publication" is a human notion. We are dealing we software here.
> http://www.pubmed.gov/ is sufficient to uniquely identify a type of data.
> What the software does with it is its own business.

Publication is a useful semantic term for a machine to resolve to.
Publication could have many may representations in machine form, it's
just important that the annotation language implies they are all to be
equivalent.

>
> > I presume there are other URI bases that
> > also mean publication? Something like:
> >
> > http://www.pubmed.gov/ isA Publication
> > http://not.in.pubmed/ isA Publication
>
> Yes. At the moment, we just have PubMed and DOI, we are adding arXiv.

So you do have a machine interpretation of Publication - yours is a lookup
list.

>
> We do not need to relate them to specify that they all deal with
> publications. It is already done by the bqmodel:isDescribedBy

No, isDescribedBy has no semantic meaning - there is nothing to say
that it explicitly defines a publication in a journal article or a
vocabulary term.

>
> > How do you extend the mapping of URI where the URI points to a general
> > identification service that resolves across, for example, different
> > publication indexes/databases. Do you need to ask people to replace
> > this URI (which may actually be usable to return some more RDF) with a
> > new one that uses a seperate namespace for each publication
> > index/databases?
>
> I think there are maybe two misunderstandings here. The first one is
> between the MIRIAM notions of data-type and of resource. MIRIAM URIs
> describe data using data-type and identifiers. This data can be
> distributed through various resources. But we do not want to put
> information about those resources in the models. The life-span of
> resources is in general pretty short.

Yes, I think we agree on that.

>
> And that brings-me to the deeper misunderstanding, that is maybe the cause
> of all this discussion. The only purpose of MIRIAM annotation is to
> uniquely identify an annotation, in a perennial way. It is not to
> implement a semantic web infrastructure where you can go directly from the
> annotation to the resource pointed by the annotation.

While there are some semantic web languages that say the identifier
for a resource is also the location of the resource - I'm certainly
not implying that here. The example of a specific data warehouse uri
is simply that if you find some resource that you would like to use to
annotate a model constituent, then it is entirely valid to simply use
whatever unique identifier that works for you - like the URL+query
string - to identify this resource, but then use a property (a
derivative of isDescribedBy perhaps) to explain the record more fully
- e.g. this is a vocabulary term, it has 'this' accession number, and
it's http location is 'here'.

>
> Regarding the general identification service, we could add-it in MIRIAM
> resource, and it would become just another data-type.
>
> > Is this MIRIAM dictionary considered a global dictionary?
>
> This is the idea, as described at the end of the MIRIAM paper.
>
> > Can people
> > maintain their own local ones?
>
> We distribute the resource in an XML format for local use (for instance
> SBML-editor does not use MIRIAM webservices but a local version).
>
> > Is there a protocol for creating a
> > dictionary that maps URI (bases or namespaces?) to meaning - e.g. isA
> > Publication - and a way to share this with others?
>
> No ... because again we do not need that. The URI is a synonymous of the
> data-type. We do not need to say http://www.pubmed.gov/ isA Publication.
> Besides, PubMed and DOI may be viewed as two types of publications. But on
> the other side, DOI are attributed for numerical objects that are not
> publications. Whatever classification we design will be useless or even
> misleading for some people. For instance, I often classify ChEBI and
> InterPro together with GO as ontologies. But at the EBI, most people put
> consider ChEBI as a database of chemical compounds.

It would be useful for the interpretation context to be specific.
There's no harm in them being derived from a common property type.

>
> >> The URI scheme should not change.
> >
> > Why? There are a number of reasons the URIs (including the namespace)
> > may change and RDF certainly doesn't suggest they shouldn't. A more
> > common case though is that more namespaces are added for reasons such
> > as different authority over similar resources, different versions of
> > resources, dividing out a data warehouse into its original providers,
> > or collapsing databases into a warehouse.
>
> You are right. And this is why we have a deprecation system (which is at
> the moment used to correct our initial mistakes chosing the URIs).

Though the old URIs may still be valid in their original context of
how they were used.

>
> > I presume you allow for different base URIs that share a common
> > namespace to identify with different things? e.g.
> > http://www.organisation.org/models and
> > http://www.organisation.org/microarray
>
> Yes. Those are different data-types. They have different URI. the fact
> that they the same root is irrelevant (the example in MIRIAM is KEGG)
>
> > How do you say one URI is the same as another in your dictionary?
>
> There is only ONE official URI per data-type (well actually two, the URL
> and URN forms). But you can have deprecated ones.

So there is no way to determine if some set of URIs are controlled
vocab terms and some set are journal articles and some set are
experimental result sets?

> Resources are different
> though. You may have many resources corresponding to one data-type. It is
> up to the user to decide which one he wants to use for instance to build
> hyperlinks. But he may want to do something different, like mapping PDB
> URIs to local atomic coordinates that would be loaded in a 3D viewer.
>
> >> Exactly, and this is why we dumped first CellML metadata. When we
> >> started
> >> with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and
> >> bqs:CAS_id
> >>
> >> - PubMed and Medline are redundant (Medline actually gave up their id.
> >> They use PubMed ones now)
> >>
> >> - We could not refer to anything that was not in PubMed. This is the
> >> case
> >> of MANY models.
> >
> > Why is that?
>
> I may missed something. I do not understand the question. Why are many
> models not described in publications indexed in PubMed?

Sorry. Why weren't you able to refer to anything that was not in PubMed?

>
> >> We coul have asked you guys to develop a new version of CellML metadata
> >> spec, with bqs:DOI. How long before we would asked another version with
> >> bqs:arXiv? bqs:Scopus_id?
> >
> > That's what I'd expect people to do.
>
> But this is not feasible! The release cycle of a standard format, and
> updating of a database are completely different (years versus seconds).
> How many versions of CellML metadata did-you have so far?

The CellML metadata specification just provides a way to bind rdf
statements to identified model elements. The parts relating to
publications, vocabulary terms, curators etc are recommended ways to
provide annotation using various standards or specifications. In the
practical sense, interpreting the RDF in the context of the associated
RDF Schemas provides the explicit meaning of the annotation. It is
these Schemas that would be updated to reflect things like bqs:asXiv,
and they would simply increase in version like normal code versioning
systems. The important thing is that people use the schemas and
validate against them, then we are guaranteed to be able to interpret
them.

>
> >> First, the type of metadata evolves very rapidly. We already have 29
> >> types
> >> in MIRIMA resources, but I anticipate that number to grow very rapidly
> >> as
> >> libSBML3 (that implement the RDF annotation scheme) is adopted by the
> >> developers.
> >
> > How does that fail externalisation of metadata type through publishing
> > schemas?
>
> Because MIRIAM resources can be updated in a second, and then the
> webservices make it immediately available to resolve annotations.
> To develop a schema takes time, energy and people.

I don't see a whole lot of difference between an RDF Schema and a
custom lookup table except that the Schema is a whole lot more
flexible, especially if people want to customise it for in-house
purposes but still be able to produce valid metadata for the wider
community. It is in fact where I expect most pressure to add new
properties and datatypes to the global schemas to come from.

> Who will do-it?
> The SBML-team is actually providing XML-schemas for SBML, and this is
> quite a hard job to do it properly.

How does this relate to RDF Schemas or annotation in general?

> But more importantly, software
> developers often use local versions of the schemas.
>
> Finally MIRIAM resources can be completed by anybody. No need to wait for
> the SBML team or the CellML team to be ready to make the change.

I don't understand what you mean. What sort of 'completion' would take
place that may require one of our teams to have to make a change?

>
>
> --
> Nicolas LE NOVERE, Computational Neurobiology,
> EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
> Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
> http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:nlenovere at hotmail.com
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>


Matt




Archive powered by MHonArc 2.6.18.

Top of page