CellML Discussion List

Text archives Help


[cellml-discussion] Biological and other non-model citations in CellML metadata?


Chronological Thread 
  • From: matt.halstead at auckland.ac.nz (Matt )
  • Subject: [cellml-discussion] Biological and other non-model citations in CellML metadata?
  • Date: Sun, 1 Apr 2007 22:13:49 +1200

Hi Nicolas. Thanks for the in-depth reply.

On 3/31/07, Nicolas Le Novere <lenov at ebi.ac.uk> wrote:
>
> > I misunderstand the scope of the property isDescribedBy. I also don't
> > think reverse engineering URIs to obtain meaning is a good practice.
>
> But ... you do not reverse engineer anything.

Though you have to pull apart the URI correctly to discover the key.
(as well as pulling apart the rdf structure that it's embedded in,
which I assume is well defined - only rdf containers of URIs
allowed?).

> The URI IS the meaning. In
> the English dictionary, there is a word "publication", with a definition.
> Well, in MIRIAM dictionary, this word is "http://www.pubmed.gov/";

So you say somewhere in the dictionary that there is a set of things
that are Publications and this set is denoted by any URI that starts
with http://www.pubmed.gov/ ? I presume there are other URI bases that
also mean publication? Something like:

http://www.pubmed.gov/ isA Publication
http://not.in.pubmed/ isA Publication

How do you extend the mapping of URI where the URI points to a general
identification service that resolves across, for example, different
publication indexes/databases. Do you need to ask people to replace
this URI (which may actually be usable to return some more RDF) with a
new one that uses a seperate namespace for each publication
index/databases?

Is this MIRIAM dictionary considered a global dictionary? Can people
maintain their own local ones? Is there a protocol for creating a
dictionary that maps URI (bases or namespaces?) to meaning - e.g. isA
Publication - and a way to share this with others?

>
> > How do you say one URI means the same as another if the URI scheme
> > changes? It seems you leave this up to the developer to make sure they
> > accomodate both instead of letting rdfs take care of this.
>
> The URI scheme should not change.

Why? There are a number of reasons the URIs (including the namespace)
may change and RDF certainly doesn't suggest they shouldn't. A more
common case though is that more namespaces are added for reasons such
as different authority over similar resources, different versions of
resources, dividing out a data warehouse into its original providers,
or collapsing databases into a warehouse.

> In the rare case it changes, it is up to
> us to provide a deprecation system so that the developer actually does not
> feel the change at all.

I presume you allow for different base URIs that share a common
namespace to identify with different things? e.g.
http://www.organisation.org/models and
http://www.organisation.org/microarray

How do you say one URI is the same as another in your dictionary?

>
> >> If you create an element <bqs:PubMed_id> in your language, rather than
> >> having a generic reference scheme, with a type PubMed defined elsewhere.
> >
> > I don't see what hardcoded means. Using properties that are defined in
> > a shared schema or standard is a pretty basic premise of sharing
> > information using RDF. How do you think RSS or dublin core works?
>
> Exactly, and this is why we dumped first CellML metadata. When we started
> with CellML metadata, we had bqs:PubMed_id, bqs:Medline_id and bqs:CAS_id
>
> - PubMed and Medline are redundant (Medline actually gave up their id.
> They use PubMed ones now)
>
> - We could not refer to anything that was not in PubMed. This is the case
> of MANY models.

Why is that?

>
> We coul have asked you guys to develop a new version of CellML metadata
> spec, with bqs:DOI. How long before we would asked another version with
> bqs:arXiv? bqs:Scopus_id?
>

That's what I'd expect people to do.

> And this is only for bibliography.
>
> By the way, Dublin Core does not work that well. I explored extensively
> the usage of DC and it is an absolute mess. Everybody implement its own
> usages and rules. Elements and their syntax change wildly from one place
> to the other. At the end, this is the usual semantic web. Everybody export
> the information alright, but barely anybody can read-it. I am not even
> talking about interpreting it.

I think it works well for simple authorship details of a content item.

>
> > our bqs namespace reflects OMG's Bibliographic Query Service
> > specification - see http://www.omg.org/docs/dtc/01-04-05.pdf
>
> I know BQS very well. It has been developed in the room next to mine for a
> project that died before birth (because we massively gave up CORBA). It is
> NOT a standard. It has been endorsed by the OMG because at that time the
> EBI was a member, and because Martin Senger was the person writing many of
> those things (he also wrote the specification of LSID)

I'm not sure I see the point here. It was used because it offered the
right concepts for what it was intended to be used for. The RDF Schema
is available over the web, and any RDFS aware interpreter can use it
to "better" understand the annotation properties of CellML models.
It's shared and available and referenced by models.

>
> >> >> The big advantage of externalising the type of metadata is that the
> >> >> scheme is generic.
> >
> > What does generic mean? Standardised and used by everyone like a
> > published schema would be?
>
> No. Generic, because it can be applied to any kind of data. You do not
> need to define specific scheme for each new data-type. Bibliography is
> just a type of metadata like any other. No need to a special treatment.

Sure there is. You either have no boundaries at all or very well
defined ones. We have chosen well defined ones. But I am thinking you
do also - but at the object level (in the subject predicate object
relation model), though you have your own way of interpreting type by
pulling apart a resource URI. All I was suggesting is that this seems
to place an extra burden on the code to work out whether an annotation
is relevant and what it pertains to.

>
> >> > What do you mean by the 'type' of metadata?
> >>
> >> EC page, PubMed entry, DOI indexed document, UniProt entry, Gene
> >> Ontology term etc.
> >
> > The type of metadata is externalized as soon as it is presented in a
> > Schema and made public and adopted by the community.
>
> That cannot work for several reasons.
>
> First, the type of metadata evolves very rapidly. We already have 29 types
> in MIRIMA resources, but I anticipate that number to grow very rapidly as
> libSBML3 (that implement the RDF annotation scheme) is adopted by the
> developers.

How does that fail externalisation of metadata type through publishing
schemas?

>
> Second, the relevant metadata varies according to the community. An
> obvious example is BQS. It used PubMed because it was developed at the EBI
> by a software engineer who just did not know there was anything else than
> PubMed in bibliography.

But schemas have class subsumption to allow for this variation to be
interpreted with the same 'meaning'. I'm not really too fussed what
namespace a property comes from, so long as it is described in a
schema and based on a schema that is agreed on more generally.

>
> Third, who decide what "adopted" and "community" means? For instance we
> have been struggling with that in SBML for years. We are just starting to
> have a robust model of development, with a balance between democracy and
> technical soundness. I think we actually have a pretty good system. But it
> is not trivial. (In case there is a misunderstanding here: SBML and MIRIAM
> are separate entities. I am just using SBML as an example).

I think it is this natural process you identify. At first it must seem
a bit kaleidoscopic but over time the more consistently used and
agreed on semantics show themselves. I believe in capturing those
semantics (in property types and classes) to allow others to build on
them or to argue for or against them or to simply be able to track any
change in them over time.

>
> Finally, when was CellML metadata made public, how was the community
> consulted, and how its feedback was incorporated in the specification?

My understanding is that the metadata framework was developed within
the environment of a collaboration between the University of Auckland
and a couple of other groups. It was developed with extensibility in
mind - and I take my hat of to the group back then in choosing
something like RDF and RDF Schemas when the technology was really very
young.

My understanding is that because it is flexible you can start with a
recommendation and evolve as quickly as you need to once a wider group
of people begin using it too, but that you don't lose the semantic
value of any data encoded in the kaleidoscope phases.

> (I
> am not even talking about BQS. As I said this is NOT a community standard.
> It is a data-model developed by one person for a very specific project).
>
> >> > I think more I am misunderstanding the range of use isDescribedBy is
> >> ok for.
> >>
> >> isDescribedBy is a relationships.
> >
> > are you meaning is-a relation? as in Type?
>
> Yes,
>
> <model id="EPSP_Edelstein" metaid="_000001">
> [...]
> <rdf:Description rdf:about="#_000001">
> <bqmodel:isDescribedBy>
> <rdf:Bag>
> <rdf:li rdf:resource="http://www.pubmed.gov/#8983160"/>
> </rdf:Bag>
> </bqmodel:isDescribedBy>
>
> means:
>
> the model contained in the SBML model "EPSP_Edelstein" is described in the
> metadata "8983160" of the data-type "http://www.pubmed.gov/";

And rdf types or subproperties don't interest you?

>
> > I don't see this at all. isVersionOf and hasVersion is about
> > "versions" as in successors where both subject and object of the
> > relation are talking about the same thing in the same format but
> > differ somewhat in the content - e.g. a newer SBML file. Here is a
> > paste of what dublin core say:
>
> We are not using Dublic Qualifiers exactly for that reason. Andrew Finney
> pointed out the difference of semantics, and that's why it was decided to
> develop biomodels qualifiers. The definition of biomodels qualifiers is
> described at:
>
> http://www.ebi.ac.uk/compneur-srv/miriam-main/mdb?section=qualifiers
>
> and
>
> http://biomodels.net/index.php?s=Qualifiers

Sure, this is definitely not dublin core's domain.

>
> > Why wouldn't the dublincore references and isReferencedBy be a useful
> > substitute for isDescribedBy when you are referring to a publication?
> >>From the dublin core spec:
> >
> > "The isReferencedBy and References refinements enable the expression
> > of relationships that aid the user but are not necessary tied to the
> > life cycle or necessary for the intended use of the resource. This
> > relationship might be used to link an article critical of a resource
> > to that resource, a satire of a speech to the original speech, etc."
> >
> > Surely that provides some semantic value?
>
> :-)
>
> This is what we used until the end of 2006 in fact! But then we decided to
> stop confusing people with DC qualifiers for some metadata and not for the
> others. Besides, isReferencedBy really link a document to another
> document. Here we want to describe the relationship between a document and
> a model. Finally, we will use isDescribedBy also to link parameters and
> the literature that described the measure of this parameter

I think overall we are trying to achieve the same thing in terms of
expressiveness and interpretability, and I guess I am just left
wondering why RDF schemas (or something built off them) didn't
interest you so much.

cheers
Matt


>
>
> --
> Nicolas LE NOVERE, Computational Neurobiology,
> EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
> Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
> http://www.ebi.ac.uk/~lenov, AIM:nlenovere, MSN:nlenovere at hotmail.com
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>




Archive powered by MHonArc 2.6.18.

Top of page