CellML Discussion List

Text archives Help


[cellml-discussion] Biological and other non-model citations in CellML metadata?


Chronological Thread 
  • From: lenov at ebi.ac.uk (Nicolas Le Novere)
  • Subject: [cellml-discussion] Biological and other non-model citations in CellML metadata?
  • Date: Sun, 1 Apr 2007 12:46:46 +0100 (BST)


>> > I misunderstand the scope of the property isDescribedBy. I also don't
>> > think reverse engineering URIs to obtain meaning is a good practice.
>>
>> But ... you do not reverse engineer anything.
>
> Though you have to pull apart the URI correctly to discover the key.

You mean to split urn:MyURI:12345 into urn:MyURI and 12345?
(I voluntarily use URN form rather than URL to avoid confusion)

Yes we may have to so that in some cases.

>> The URI IS the meaning. In
>> the English dictionary, there is a word "publication", with a
>> definition.
>> Well, in MIRIAM dictionary, this word is "http://www.pubmed.gov/";
>
> So you say somewhere in the dictionary that there is a set of things
> that are Publications and this set is denoted by any URI that starts
> with http://www.pubmed.gov/ ?

No. "Publication" is a human notion. We are dealing we software here.
http://www.pubmed.gov/ is sufficient to uniquely identify a type of data.
What the software does with it is its own business.

> I presume there are other URI bases that
> also mean publication? Something like:
>
> http://www.pubmed.gov/ isA Publication
> http://not.in.pubmed/ isA Publication

Yes. At the moment, we just have PubMed and DOI, we are adding arXiv.

We do not need to relate them to specify that they all deal with
publications. It is already done by the bqmodel:isDescribedBy

> How do you extend the mapping of URI where the URI points to a general
> identification service that resolves across, for example, different
> publication indexes/databases. Do you need to ask people to replace
> this URI (which may actually be usable to return some more RDF) with a
> new one that uses a seperate namespace for each publication
> index/databases?

I think there are maybe two misunderstandings here. The first one is
between the MIRIAM notions of data-type and of resource. MIRIAM URIs
describe data using data-type and identifiers. This data can be
distributed through various resources. But we do not want to put
information about those resources in the models. The life-span of
resources is in general pretty short.

And that brings-me to the deeper misunderstanding, that is maybe the cause
of all this discussion. The only purpose of MIRIAM annotation is to
uniquely identify an annotation, in a perennial way. It is not to
implement a semantic web infrastructure where you can go directly from the
annotation to the resource pointed by the annotation.

Regarding the general identification service, we could add-it in MIRIAM
resource, and it would become just another data-type.

> Is this MIRIAM dictionary considered a global dictionary?

This is the idea, as described at the end of the MIRIAM paper.

> Can people
> maintain their own local ones?

We distribute the resource in an XML format for local use (for instance
SBML-editor does not use MIRIAM webservices but a local version).

> Is there a protocol for creating a
> dictionary that maps URI (bases or namespaces?) to meaning - e.g. isA
> Publication - and a way to share this with others?

No ... because again we do not need that. The URI is a synonymous of the
data-type. We do not need to say http://www.pubmed.gov/ isA Publication.
Besides, PubMed and DOI may be viewed as two types of publications. But on
the other side, DOI are attributed for numerical objects that are not
publications. Whatever classification we design will be useless or even
misleading for some people. For instance, I often classify ChEBI and
InterPro together with GO as ontologies. But at the EBI, most people put
consider ChEBI as a database of chemical compounds.

>> The URI scheme should not change.
>
> Why? There are a number of reasons the URIs (including the namespace)
> may change and RDF certainly doesn't suggest they shouldn't. A more
> common case though is that more namespaces are added for reasons such
> as different authority over similar resources, different versions of
> resources, dividing out a data warehouse into its original providers,
> or collapsing databases into a warehouse.

You are right. And this is why we have a deprecation system (which is at
the moment used to correct our initial mistakes chosing the URIs).

> I presume you allow for different base URIs that share a common
> namespace to identify with different things? e.g.
> http://www.organisation.org/models and
> http://www.organisation.org/microarray

Yes. Those are different data-types. They have different URI. the fact
that they the same root is irrelevant (the example in MIRIAM is KEGG)

> How do you say one URI is the same as another in your dictionary?

There is only ONE official URI per data-type (well actually two, the URL
and URN forms). But you can have deprecated ones. Resources are different
though. You may have many resources corresponding to one data-type. It is
up to the user to decide which one he wants to use for instance to build
hyperlinks. But he may want to do something different, like mapping PDB
URIs to local atomic coordinates that would be loaded in a 3D viewer.

>> Exactly, and this is why we dumped first CellML metadata. When we
>> started
>> with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and
>> bqs:CAS_id
>>
>> - PubMed and Medline are redundant (Medline actually gave up their id.
>> They use PubMed ones now)
>>
>> - We could not refer to anything that was not in PubMed. This is the
>> case
>> of MANY models.
>
> Why is that?

I may missed something. I do not understand the question. Why are many
models not described in publications indexed in PubMed?

>> We coul have asked you guys to develop a new version of CellML metadata
>> spec, with bqs:DOI. How long before we would asked another version with
>> bqs:arXiv? bqs:Scopus_id?
>
> That's what I'd expect people to do.

But this is not feasible! The release cycle of a standard format, and
updating of a database are completely different (years versus seconds).
How many versions of CellML metadata did-you have so far?

>> First, the type of metadata evolves very rapidly. We already have 29
>> types
>> in MIRIMA resources, but I anticipate that number to grow very rapidly
>> as
>> libSBML3 (that implement the RDF annotation scheme) is adopted by the
>> developers.
>
> How does that fail externalisation of metadata type through publishing
> schemas?

Because MIRIAM resources can be updated in a second, and then the
webservices make it immediately available to resolve annotations.
To develop a schema takes time, energy and people. Who will do-it?
The SBML-team is actually providing XML-schemas for SBML, and this is
quite a hard job to do it properly. But more importantly, software
developers often use local versions of the schemas.

Finally MIRIAM resources can be completed by anybody. No need to wait for
the SBML team or the CellML team to be ready to make the change.


--
Nicolas LE NOVERE, Computational Neurobiology,
EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:nlenovere at hotmail.com





Archive powered by MHonArc 2.6.18.

Top of page