cellml-tools-developers - [cellml-dev] RDF libraries in CDA

Other List Domains [+]

Separate list domains for faculties, departments and business units

Subscribers: 45
Owners

Tommy Yu

Contact owners

Post

cellml-tools-developers AT lists.cellml.org

A list for the developers of CellML tools

Text archives Help

[cellml-dev] RDF libraries in CDA

From: m.taschuk at newcastle.ac.uk (Morgan Taschuk)
Subject: [cellml-dev] RDF libraries in CDA
Date: Wed, 9 Sep 2009 20:41:26 +0100

Hi Justin,

Allyson Lister and I have discussed what we would like to see from the CDA
regarding the annotations and we've come up with several situations, a few
problems, and also a few solutions. Hopefully, you'll find our ideas useful.

In brief, we would like the CDA to be able to perform the following functions:

1) Retrieve and iterate over all CV terms (annotations) for a particular
variable, math, or group.
2) Retrieve a list of objects containing a particular CV Term.
3) Create a new CV Term
4) Add a CV term to a variable, math or group. The API should determine
whether the annotation is unique or identical to another annotation, and
assign an identifier accordingly (see discussion below).
5) Remove a CV term from a variable, math or group.

I would think each "CV term" would contain:
1) a qualifier from a standard list (like the qualifiers in libsbmlConstants,
but without divisions between biological or model qualifiers, etc.) WHY???
2) one or more URIs
3) a label, to be used as an identifier

For reference, we're looking at the model that Matt Halstead sent around in
August. I attach it for convenience's sake. As you can see, each component
contains RDF that annotates the variables and math that are part of that
component. The component RDF usually links back to model-wide (global) RDF,
so that multiple components can make use of the same RDF annotation. While
this approach is obviously more flexible and less prone to error, it also
poses some difficulties for the API.

We would prefer if the structure of the RDF is hidden from developers, with a
few exceptions that I list below. For example, take a look at the abridged
CellML model with RDF annotation as below.



<rdf:RDF>
<rdf:Description rdf:about="#_M_">
<rdfs:label>Inactive cdc2 kinase</rdfs:label>
<bqbiol:hasVersion>
<rdf:Bag>
<rdf:li rdf:resource="urn:miriam:
uniprot:P35567"/>
<rdf:li rdf:resource="urn:miriam:uniprot:P24033"/>
</rdf:Bag>
</bqbiol:hasVersion>
</rdf:Description>
</rdf:RDF>



<component name="C" cmeta:id="C">
<rdf:RDF>
<rdf:Description rdf:about="#C_M_">
<rdfs:label>Fraction of inactive cdc2 kinase</rdfs:label>
<cmeta:biomodels rdf:resource="#_M_"/>
</rdf:Description>
</rdf:RDF>
<variable units="dimensionless" public_interface="out" name="M_"
cmeta:id="C_M_"/>
</component>

When we query the API to find what annotations are available for the cmeta:id
"C_M_", we would like the API to return that it has three annotations
available: two rdfs:labels and a bqbiol:is (or whatever constant/enumeration
value you decide will stand for bqbiol:is). The bqbiol:is should contain the
two MIRIAM URIs. In this way, the "_M_" link and the rdf:Bag is hidden from
the developer.

Incidentally, this raises another question: how do you deal with elements
like rdfs:labels or rdfs:comments that are both in the component annotation
and in the model-wide annotation? This could conceivably cause problems with
comparing two CV term sets for equivalence - what happens if the two CV term
sets are identical except for the labels/comments? We might suggest that
labels/comments and CV Terms are separately referenced to make retrieval and
editing simpler.

We can see two situations that may arise in our program.

1) A developer wishes to change or add the CV Term for a specific component
variable or math

2) A developer wishes to change the global CV Term which is used by many
different component variables or math

In regards to 1), he adds or removes CV Terms from the component variable or
math. The CV Term must change solely for that variable or math. The changes
may cause the CV Term to differ from the previously set global CV Term, in
which case a new CV Term must be created, or the changes cause the variable
CV Term to match a global CV Term so the equivalency must be identified.

When the developer performs this operation, the RDF structure should be
hidden from him.

In 2), he wants to change a global CV Term. For example, the user has found a
new MIRIAM term to represent the variable Kd. Rather than changing every
single reference to Kd as in 1), he just changes the global CV Term of Kd and
the changes reflect in all of the variable annotations that reference the
global Kd.

Having access to these global CV Terms may be contradictory to our desire to
have the RDF structure hidden from developers. However, without this
functionality, the API would be bypassing the advantages of having some
global CV Terms at all.

Accordingly, we think that there should be a slightly different method of
accessing the global CV terms that are unrelated to a particular component
variable, math, or group. It may also be useful for the global CV Terms to
have links back to the variables, math or groups that use them for
annotation. That way, the developer could query for all of the variables that
are defined by "Kd".

We hope that this helps to clarify our desires for the CDA. Please do not
hesitate to contact us for clarification or additional information on any
point.

Sincerely,
Morgan Taschuk and Allyson Lister

________________________________________
From: cellml-tools-developers-bounces at cellml.org
[cellml-tools-developers-bounces at cellml.org] On Behalf Of Justin Marsh
[j.marsh at auckland.ac.nz]
Sent: 08 September 2009 23:09
To: A list for the developers of CellML tools
Subject: Re: [cellml-dev] RDF libraries in CDA

Hi Morgan,

We are looking into what libSBML provides with regards to annotation
services; it would help if we knew exactly what of this functionality
you are interested in. At the moment, my assumption is that you are
interested in extracting, editing and inserting controlled vocabulary
terms that reference specific elements, from those elements, with a
possible specialisation of interface for a few common usages, such as
model revision history and authorship.

However, providing a bridge between the RDF service in the CDA and some
other RDF library which can process SPARQL queries, for instance, may be
of more interest to you.

Best Regards,
Justin.

Morgan Taschuk wrote:
> Hi,
>
>> Personally, I prefer to use a more feature rich RDF library and use
>> SPARQL queries to find items of interest. Although I believe the
>> intention with the RDF service in the CDA is to provide wrappers for
>> the types of functionality typically required with CellML models -
>> which should include MIRIAM style annotations. Not sure when that will
>> happen though, I suspect it is currently quite high on the priority
>> list for the CDA developers.
>
> I'm not very familiar with RDF/XML, so I'd prefer not to have to learn an
> RDF library well enough to implement it for CellML annotation, especially
> if I will just be duplicating efforts with the CDA developers team.
> Unfortunately, we will have to wait to implement CellML support in Saint
> until the CDA supports the programmatic addition of annotation, for
> instance in a manner similar to libSBML.
>
> Sincerely,
> Morgan Taschuk
>
>
> ________________________________________
> From: cellml-tools-developers-bounces at cellml.org
> [cellml-tools-developers-bounces at cellml.org] On Behalf Of David
> Nickerson [david.nickerson at gmail.com]
> Sent: 10 August 2009 10:54
> To: A list for the developers of CellML tools
> Subject: Re: [cellml-dev] RDF libraries in CDA
> Hi Morgan,
>> However, I believe my first issue is still valid. Is the hierarchy of the
>> RDF somehow preserved by the parser? All of the child nodes seem to look
>> to the same BlankNode regardless of their hierarchical depth. Is it
>> implicit by the ordering that if a BlankNode has a Bag, the following URIs
>> are contained within the bag? I'm concerned primarily because I need to be
>> able to programmatically write the RDF.
>>
> I'm wondering if maybe you are maybe confusing RDF/XML serialization
> with the RDF triples provided by the RDF service of the CDA. For
> example, if you read a model into OpenCell and then write it out, I
> think all the metadata is serialized back into RDF/XML reflecting all
> the blank nodes explicitly rather than the hierarchical RDF/XML
> typically written by hand (i.e., the examples Matt sent through the
> other day).
> I can't recall the details, but I think for rdf:Bag's you need to
> navigate through the rdf:li blank nodes to find the objects contained
> in the bag. Then for rdf:Collection's you need to use the rdf:first
> and rdf:rest properties to navigate through the members of the
> collection (until you hit the rdf:nil). Its quite different to
> navigating through an XML DOM.
> Personally, I prefer to use a more feature rich RDF library and use
> SPARQL queries to find items of interest. Although I believe the
> intention with the RDF service in the CDA is to provide wrappers for
> the types of functionality typically required with CellML models -
> which should include MIRIAM style annotations. Not sure when that will
> happen though, I suspect it is currently quite high on the priority
> list for the CDA developers.
>
> Cheers,
> Andre.
> _______________________________________________
> cellml-tools-developers mailing list
> cellml-tools-developers at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-tools-developers________________________________________
> From: cellml-tools-developers-bounces at cellml.org
> [cellml-tools-developers-bounces at cellml.org] On Behalf Of Andrew Miller
> [ak.miller at auckland.ac.nz]
> Sent: 07 August 2009 20:51
> To: A list for the developers of CellML tools
> Subject: Re: [cellml-dev] RDF libraries in CDA
> Morgan Taschuk wrote:
>> Hi everyone,
>>
>> Sorry for the second email, but I tweaked my code and now I have
>> different output.
>>
>> For issue 1 re: depth of RDF parsing, I tweaked things so that the
>> objects were no longer null, but BlankNodes for some reason.
>>
>> The RDF is still as follows.
>>
>> <rdf:Description rdf:about="#_cyclin">
>> <rdfs:label>cyclin</rdfs:label>
>> <bqbiol:isVersionOf>
>> <rdf:Bag>
>> <rdf:li rdf:resource="urn:miriam:interpro:IPR006670"/>
>> </rdf:Bag>
>> </bqbiol:isVersionOf>
>> </rdf:Description>
>>
>> But now I realize that the triples actually look like this:
>>
>> Triple
>> subject #_cyclin
>> predicate http://www.w3.org/TR/1999/PR-rdf-schema-19990303#label
>> object PlainLiteral:en cyclin
>>
>> Triple
>> subject #_cyclin
>> predicate http://biomodels.net/biology-qualifiers/isVersionOf
>> object Blank Node
>>
>> Triple
>> subject Blank Node+
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>> object http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag
>>
>> Triple
>> subject Blank Node+
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
>> object urn:miriam:interpro:IPR006670
>>
>> Triple
>> subject Blank Node+
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>> object http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag
>>
>> Triple
>> subject Blank Node+
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
>> object urn:miriam:interpro:IPR006670
>>
>>
>>
>> Where the object:BlankNode in the second subject:#_cyclin obviously
>> corresponds to the subject:BlankNode+ triples for the other four. I
>> found that I can link the two if I call getTriplesInto on the BlankNode+
>> subjects. However, the hierarchy doesn't seem to be preserved: BlankNode
>> links to both the URIReference to the Bag and to the InterPro reference.
>>
>> As for issue 2, I see now that (due to fixing my bug in my code)
>> attributes are returned as objects. But, related to the problem above,
>> the RDF parser doesn't always seem to notice the attributes. For
>> example, in the following RDF:
>>
>> <rdf:Description rdf:about="rdf:#$XwCaL2">
>> <rdf:first rdf:about="aboutFirst">
>> <rdfs:comment rdfs:about="aboutComment">
>> here is a test for a nested comment.
>> </rdfs:comment>
>> </rdf:first>
>> </rdf:Description>
>>
>>
>> The parser sees the rdfs:comment tag, and the rdfs:about attribute, but
>> not the rdf:about attribute or the comment itself.
>>
>> subject rdf:#$XwCaL2
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#first
>> object Blank Node
>> >>>>>>>>>>>>>>>is linked to >>>>>>>>>>>>>
>> subject Blank Node
>> predicate http://www.w3.org/TR/1999/PR-rdf-schema-19990303#about
>> object PlainLiteral:en aboutComment
>>
>>
>>
>> subject rdf:#$XwCaL2
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#first
>> object Blank Node
>> >>>>>>>>>>>>>>>is linked to >>>>>>>>>>>>>
>> subject Blank Node
>> predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>> object http://www.w3.org/TR/1999/PR-rdf-schema-19990303#comment
>>
>>
>> Why would it return the contents of the Bag in the first example, but
>> not the comment in the second example? Unfortunately, calling
>> getTriplesWhereSubject() on the URIReference comment object returns no
>> triples.
>
> Your second example is not valid RDF/XML. Try going to:
> http://www.w3.org/RDF/Validator/
> and entering your example in the box as follows:
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
> xmlns:dc="http://purl.org/dc/elements/1.1/";
> xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#";>
> <rdf:Description rdf:about="rdf:#$XwCaL2">
> <rdf:first rdf:about="aboutFirst">
> <rdfs:comment rdfs:about="aboutComment">
> here is a test for a nested comment.
> </rdfs:comment>
> </rdf:first>
> </rdf:Description>
> </rdf:RDF>
>
> The CellML API RDF/XML parser is written to closely follow the RDF/XML
> specification, and it does not have validation features, so it won't
> tell you if your RDF/XML is invalid. It merely aims to do the minimum
> work needed to parse a valid RDF/XML document into the correct triples,
> while not crashing on invalid ones. In this case, it sees the
> rdfs:about, so it doesn't even look for the text child, as it would not
> be valid for that to be there.
> Best wishes,
> Andrew
>> Thanks again.
>>
>> Sincerely,
>> Morgan Taschuk
>>
>>
>>
>>
>> Morgan Taschuk wrote:
>>> Hello all,
>>>
>>> I'm attempting to use the CDA library (in Java) to parse some RDF that
>>> Matt provided. I have two questions: one relating to the depth of the
>>> parsing, and one related to retrieving information from the triples.
>>>
>>>
>>> 1) How deep does the RDF parsing go? For example, the RDF looks like
>>> this:
>>>
>>> <rdf:Description rdf:about="#_cyclin">
>>> <rdfs:label>cyclin</rdfs:label>
>>> <bqbiol:isVersionOf>
>>> <rdf:Bag>
>>> <rdf:li rdf:resource="urn:miriam:interpro:IPR006670"/>
>>> </rdf:Bag>
>>> </bqbiol:isVersionOf>
>>> </rdf:Description>
>>>
>>>
>>> When I try to parse the triples, I get the following values for the
>>> RDF triples:
>>>
>>> Triple
>>> subject #_cyclin
>>> predicate http://biomodels.net/biology-qualifiers/isVersionOf
>>> object null
>>>
>>> Triple
>>> subject #_cyclin
>>> predicate http://www.w3.org/TR/1999/PR-rdf-schema-19990303#label
>>> object PlainLiteral: cyclin
>>>
>>> Nothing I try will give me a value for the first object that should be
>>> a least a URIResource to rdf:Bag. While the code theoretically
>>> indicates that this could be so, the object never appears to be
>>> anything but a Literal or null. Does the RDF parser not read the
>>> entire hierarchy of the RDF?
>>>
>>> 2) When I have a section of RDF such as the following:
>>>
>>> <rdf:Description rdf:about="rdf:#$XwCaL2">
>>> <rdf:first rdf:resource="rdf:#$YwCaL2"/>
>>> </rdf:Description>
>>>
>>>
>>> This is the what the triple looks like:
>>> Triple
>>> subject rdf:#$XwCaL2
>>> predicate http://www.w3.org/1999/02/22-rdf-syntax#first
>>> object null
>>>
>>>
>>> How do I retrieve the attributes from the URLResource predicate, in
>>> this case, rdf:resource="rdf:#$YxCaL2" ?
>>>
>>>
>>> Thanks very much in advance for your help.
>>>
>>> Sincerely,
>>> Morgan Taschuk
>>> _______________________________________________
>>> cellml-tools-developers mailing list
>>> cellml-tools-developers at cellml.org
>>> http://www.cellml.org/mailman/listinfo/cellml-tools-developers
>> _______________________________________________
>> cellml-tools-developers mailing list
>> cellml-tools-developers at cellml.org
>> http://www.cellml.org/mailman/listinfo/cellml-tools-developers
> _______________________________________________
> cellml-tools-developers mailing list
> cellml-tools-developers at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-tools-developers
> _______________________________________________
> cellml-tools-developers mailing list
> cellml-tools-developers at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-tools-developers
> _______________________________________________
> cellml-tools-developers mailing list
> cellml-tools-developers at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-tools-developers

_______________________________________________
cellml-tools-developers mailing list
cellml-tools-developers at cellml.org
http://www.cellml.org/mailman/listinfo/cellml-tools-developers

[cellml-dev] RDF libraries in CDA, Justin Marsh, 09/09/2009
- [cellml-dev] RDF libraries in CDA, Morgan Taschuk, 09/10/2009

Archive powered by MHonArc 2.6.18.

Other List Domains [+]

Other List Domains [-]

Text archives Help

[cellml-dev] RDF libraries in CDA