cellml-discussion - [cellml-discussion] Time for a decent RDF library in the CellMLAPI?

Other List Domains [+]

Separate list domains for faculties, departments and business units

Subscribers: 111
Owners

Tommy Yu

Moderators [-]

Tommy Yu

Contact owners

Post

cellml-discussion AT lists.cellml.org

CellML Discussion List

Text archives Help

[cellml-discussion] Time for a decent RDF library in the CellMLAPI?

From: ak.miller at auckland.ac.nz (Andrew Miller)
Subject: [cellml-discussion] Time for a decent RDF library in the CellMLAPI?
Date: Mon, 11 Sep 2006 16:02:36 +1200

David Nickerson wrote:
> If I understand the discussion so far, I agree with Matt and we should
> be looking at a higher level API than providing methods to manipulate
> the RDF directly.
>
Would you be opposed to offering both a generic API and specific ones?
We currently allow for extension elements to be manipulated, and RDF is
a much better representation for data outside of our existing
specifications (because it allows for arbitrary additional information
to be added to any resource in the model, can tie in information about
elements from distinct parts of the model, allows for reification to
annotate an existing arc in a standard way, and works well for meta-data
which ties together information about more than well CellML document, or
which is defined as a model supplement in an external document). By
supporting extension elements, but not supporting extension RDF,
however, we would encourage people to use plain XML instead of RDF to
represent types of data we don't know how to represent.

> I also think that while such an API will be very useful we need to
> ensure that this discussion doesn't hold up the initial release of PCEnv.
>
PCEnv needs to manipulate RDF, so having a good solution here would be
beneficial. As a temporary measure, I have written application-side code
which uses the DOM API to create a new RDF/XML document containing the
contents of all rdf elements. This document is then serialised and put
into the Mozilla RDF library. The hassle then comes when we want to
change RDF, because then the RDF/XML has to be serialised back out of
Mozilla, then the DOM has to be used to delete all RDF:rdf elements at
the API side, and then the RDF/XML has to be re-parsed at the API side,
and put as a single block as a child of the model element.

Contrast this with a simple get or change operation.

Of course, if we must do this through serialised RDF, I think the
current approach (which is to look from a child of an RDF:rdf element
child of the model, with an RDF:about attribute equal to the cmeta:id of
the element for which we request the RDF) is completely broken, and I
would like to drop this from the API, in favour of an operation on the
model, which returns a single complete RDF/XML document containing all
of the RDF from the CellML model (with no filtering attempted). We would
also need an operation to set the RDF, which will strip all RDF out of
the model, and add the result of parsing the serialised input string as
a single RDF:rdf child of the model. If we are not going to add full RDF
support to the CellML API, then this approach is better than the current
approach, because:

1) it can be implemented properly for all RDF/XML in the model, without
making any artificial assumptions about the way the RDF/XML is
structured, and without putting a full RDF/XML parser into the CellML
DOM API.

2) if we don't provide even basic RDF facilities, the only practical way
for the application to process the RDF (other than by assuming a certain
serialisation) is to provide a generic RDF/XML parser. If the
application is expected to do this, it might as well have access to the
entire RDF graph for the model, instead of just a fragment which, based
on the way the RDF/XML happened to be expressed before, was a child in
that element in the RDF tree.

3) ontology software, and other applications which need to create an
RDF graph containing RDF spanning multiple models can simply go through
a list of models, ask the models for their RDF/XML, and aggregate all
the RDF documents into a single graph, rather than having to ask the API
for every single variable / component in each model.

Therefore, I think we need to support several use cases:
1) The application has its own RDF library, and wants to do all sorts of
complex queries on it (perhaps using a query language), or wants to
aggregate graphs across multiple models. In this case, the RDF/XML
serialisation approach is probably best.
2) The application only wants to use standards from the cmeta,
simulation, and / or graph specifications. In this case, a higher level
API should be available to them.
3) The application wants to access RDF data defined by a newer
specification, or by a specification which is not used commonly enough
to warrant inclusion in the CellML API (there is an enormous number of
things which users might want to annotate about a model, depending on
what type of research they are doing and so on, many of which will be
specific to a particular field, and we cannot possibly contemplate them
all). However, they are happy with a fairly simple RDF API.

> Andre.
>
> Matt wrote:
>
>> On 6/09/2006, at 4:53 PM, Andrew Miller wrote:
>>
>>
>>> Matt wrote:
>>>
>>>> The fact there is no standardized API does not mean we invent our
>>>> own. There are plenty of RDF implementations around and a huge amount
>>>> of overlap between them. I suggest we find that subset that shows
>>>> reasonable intersection over the most popular rdf libraries and use
>>>> that.
>>>>
>>> I think you will find that my proposal meets these criteria, because I
>>> have specified all the very basic RDF operations (as well as some
>>> necessarily CellML specific ones).
>>>
>>>
>> Yep, I agree that it is a reasonable set. I'd be surprised if any
>> useful RDF library does not implement them. I don't see why we need to.
>>
Because lots of applications (website, editors, and so on) don't need
complex query languages, so they don't need any additional RDF library.
It therefore makes sense to use the RDF library at the API side, rather
than burdening applications which this responsibility. Therefore, the
RDF support in the CellML API will be 'useful' for some, but not all,
applications.

For applications where this is not sufficient to be considered useful,
the RDF/XML serialisation approach would be a better option.
>>
>>
>>> Also note that the design of the CellML API means that methods for
>>> accessing RDF are identified by URIs,
>>>
>> What do you mean by this?
>>
From CellML_APISPEC.idl:

/**
* The RDF metadata associated with this element. An element must have a
* cmeta:id for any RDF to be able to refer to it.
* @param type The URN describing the type of RDF metadata.
Implementations
* are free to add new types by creating new type names
at URNs
* under their jurisdiction. New URNs under
http://www.cellml.org
* are reserved for future versions of this specification.
* @return The object containing the RDF representation. If no arcs are
* defined, an empty RDF representation is returned. The
object may
* be cast in an application defined manner depending on the
type
* returned.
* @raises CellMLException if type isn't supported.
* All implementations must implement the following types:
* http://www.cellml.org/RDFXML/string
* http://www.cellml.org/RDFXML/DOM
*/
RDFRepresentation getRDFRepresentation(in wstring type)
raises(CellMLException);

Note however, that I am proposing moving this from the elements to the
model, so that RDFRepresentation becomes a representation of the entire
RDF graph, and the addition of some new (mandatory?) types which are
more useful.

>>
>>> so you can have more than one
>>> (although we wouldn't want to burden this on implementors, so they
>>> would
>>> have to be optional. We could have a core, required API, and allow
>>> make
>>> better RDF specifications, e.g. providing query language access,
>>> documented but not required).
>>>
>>>> But in saying that, I'm not sure you need to be exposing the
>>>> RDF through an RDF centric API. The developers of the metadata editor
>>>> found it more useful to offer an API that was centered around the
>>>> kinds of metadata that needed to be supplied - for instance to add a
>>>> series of authors, it was much nicer to be able to populate an
>>>> authors data structure, especially since in the cellml metadata
>>>> specification there is a strict interpretation of the underlying RDF
>>>> data structures - such as bags, lists etc.
>>>>
>>>>
>>> It is certainly worthwhile to offer convenience interfaces specific to
>>> certain specifications, such as cmeta, the simulation and graph
>>> specifications.
>>>
>> I see these as been the current use cases, and the most important
>> level at which to address any specific API (not the RDF level).
>>
Why can't we have both a specific and general API, if both are useful?

>>
>>
>>> However, the problem with this is that there is such a
>>> large (and continuously growing) set of RDF-based metadata that people
>>> might want to use, and so they need to be able to access this without
>>> updating the CellML API to support every specification ever invented.
>>>
>> RDF can always be processed by RDF libraries so long as the RDF/XML
>> fragments are available to load. The lesson learnt with the cellml
>> metadata editor was that the higher level API that addressed the
>> necessary and optional but useful metadata requirements were the most
>> relevant interfaces. Our metadata specification is quite strict about
>> the relationship of various RDF schemas for particular annotation
>> purposes, e.g. the combination of bqs:Person and vcard structures.
>> I think for the annotation structures that we say are necessary or
>> useful, that a predefined specification and interface is very useful,
>> especially for people trying to populate specific data structures out
>> of them. If for example we said feel free to use anything inside
>> bqs:reference, or actually any arbitrary reference schema, then we
>> run into an increasing number of permutations that one would need to
>> accommodate in RDF queries or RDF subgraph graph accessors to get at
>> the same information.
>>
I am not saying that people should represent the same information by
more than one RDF graph, but rather, people should be able to add new
information which cannot currently be represented in any supported
specifications. Our specifications do not contemplate every type of data
that people might find useful (for example, what if I wanted to
represent detailed, cardiac electrophysiology specific information about
a model? I would firstly look to see if anyone else has made an existing
specification capable of holding the information. If not, I could then
write a specification, and access it using a generic API).
>> RDF does not itself imply anything goes; I feel energy is better
>> spent specifying a strict RDF schema and an API that satisfies
>> interacting with data that conforms to this (and not at the triple
>> level).
>>
RDF is supposed to be an open-world framework. See
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-anyone:

"2.2.6 Anyone Can Make Statements About Any Resource

To facilitate operation at Internet scale, RDF is an open-world
framework that allows anyone to make statements about any resource.

In general, it is not assumed that complete information about any
resource is available. RDF does not prevent anyone from making
assertions that are nonsensical or inconsistent with other statements,
or the world as people see it. Designers of applications that use RDF
should be aware of this and may design their applications to tolerate
incomplete or inconsistent sources of information."

That said, I agree that it is very bad practice to invent a new
specification when there is an existing one capable of representing the
desired information (and if the existing one only represents part of the
information, it is better to extend it by adding new arcs, rather than
replacing it). However, this does not mean that we should not provide
the capability to use new types of information which we never contemplated.

>>> Providing an RDF API at the CellML API side is very useful, especially
>>> when there are multiple consumers of the implementation, because
>>> you are
>>> working with the real document, rather than a copy which was
>>> created at
>>> some earlier point.
>>>
>>>
>> I think you are assuming too much about the underlying framework
>> here. Perhaps I am wrong, but are referring to the shared model
>> through the corba interface?
>>
Our API interface is the same whether or not we go through CORBA, so my
comments will apply any time we are using the API. I am assuming that
the application only accesses the objects obtained from the API
implementation through the interface, and doesn't go poking around in
API implementation private memory (obviously, if you are using CORBA to
access cross-process or cross-machine API implementations, the operating
system enforces this, but even if the API resides in the same address
space, it is incredibly bad practice to break this assumption).

If our API forces us to serialise and parse to work with RDF, we are
invariably copying data, regardless of how we are using the CellML API.
Of course, once we have made this copy, it may be easier in some use
cases than in others to pass the copy around (but if you have created a
way to pass RDF associated with CellML around internally, you have
essentially created an informal API, so why not make it official rather
than rewriting it in every application?).

>>
>>
>>> As I have pointed out, if you try to get serialised RDF/XML out of an
>>> RDF/XML unaware implementation, you run into all sorts of problems
>>> with
>>> getting all the data.
>>>
>>> For example, I just had to write code like this in PCEnv:
>>> function getModelMetadata(model)
>>> {
>>> var el = model.getRDFRepresentation("http://www.cellml.org/RDFXML/
>>> DOM").
>>> QueryInterface(Components.interfaces.
>>> cellml_api_IRDFXMLDOMRepresentation).data;
>>> var od = el.ownerDocument;
>>> var rnl =
>>> od.getElementsByTagNameNS("http://www.w3.org/1999/02/22-rdf-syntax-
>>> ns#",
>>> "RDF");
>>> var l = rnl.length;
>>> var i;
>>> var td = od.implementation.createDocument(
>>> "http://www.w3.org/1999/02/22-rdf-syntax-ns#";, "rdf:RDF",
>>> od.doctype);
>>> var de = td.documentElement;
>>> for (i = 0; i < l; i++)
>>> {
>>> de.appendChild(td.importNode(rnl.item(i), true));
>>> }
>>> var rrs = window.context.cellmlBootstrap.serialiseNode(td);
>>>
>>> // Put it into the Mozilla RDF implementation...
>>> var p = Components.classes["@mozilla.org/rdf/xml-parser;1"].
>>> createInstance(Components.interfaces.nsIRDFXMLParser);
>>> var mds =
>>> Components.classes["@mozilla.org/rdf/datasource;1?name=in-memory-
>>> datasource"].
>>> createInstance(Components.interfaces.nsIRDFDataSource);
>>> var modelURI = model.base_uri.asText;
>>> var uri = Components.classes["@mozilla.org/network/standard-url;1"].
>>> createInstance(Components.interfaces.nsIURI);
>>> uri.spec = modelURI;
>>> p.parseString(mds, uri, rrs);
>>> return mds;
>>> }
>>>
>>> This is bad for several reasons:
>>> 1) I have to do a lot of work just to support a relatively common
>>> operation (getting the metadata) properly.
>>> 2) It requires a lot of communication between the CellML API and
>>> the user.
>>> 3) It uses the DOM core to traverse through nodes defined in
>>> CellML, in
>>> order to find all the RDF. The CellML API was designed to prevent
>>> this,
>>> so this is a violation of the design principles underlying the
>>> CellML API.
>>> 4) It makes a copy of the RDF from the model at the Mozilla side,
>>> which
>>> could potentially get out of sync.
>>> 5) Trying to change the model requires even more special logic (e.g.
>>> would have to write code to explicitly strip out all the rdf:RDF
>>> elements, serialise the RDF into a document at the Mozilla-side,
>>> send it
>>> across to the API side as a string and parse into a document, then
>>> import the new document element into the model document, and append to
>>> the model document element).
>>>
>>>
>> I tend to use an XPATH query and copy the fragments into a new
>> document. I don't find this particularly hard, and for any given
>> implementation of the CellML API, it's just a single call away for
>> the user of that API.
>>
You then need to get access to the CellML model with something which
supports XPath. The whole point of the CellML API is to prevent the need
for direct access to the DOM representation. Since DOM doesn't support
XPath directly, you would need to implement something which supported it
but didn't access data except through the DOM (or put the XPath
API-side, so it was allowed to poke into the internals of the
implementation). While this sounds like a common thing that there should
be code for, in practice I have found that everyone invents their own
mapping from the W3C specification to their language of choice
(ourselves included) rather than strictly following a mapping such as
the CORBA mapping (the only exception being in Javascript, where
everyone uses a fairly consistent IDL => Javascript mapping).

I think it would be far wiser to implement proper RDF support than to
implement XPath so we can write a better hack to allow us to access the
RDF. If we don't do this, then even putting this logic onto Model
instead would beat implementing XPath in terms of usability.

>>
>>
>>>> There is nothing stopping anyone adding arbitrary RDF using whatever
>>>> RDF tool they want.
>>>>
>>>>
>>> Except that RDF/XML is not a nice way to work with RDF, and as I
>>> showed
>>> above, serialise/parse creates problems (I could use your same
>>> argument
>>> to say that we should work on CellML documents from directly from the
>>> DOM core API, but that doesn't mean that it would be productive).
>>>
>> I wasn't saying that we use RDF/XML to work with RDF. I am saying
>> anyone adding arbitrary RDF to a CellML model is free to use whatever
>> RDF library implementation they want to access it. They may even have
>> their own schema aware libraries - well, I'd hope so.
>>
I'm not saying we should block this use case, but it seems silly to make
everyone include an RDF parser for common functionality (even if they
can do it by including a library), especially if that means going
through an extra serialise => parse => serialise just to make a change
to the model.

>>
>>>> Specifying and implementing our own RDF API does not make sense to me
>>>> at all.
>>>>
>>>>
>>> It makes a lot of sense to me, because it is consistent with the main
>>> goal of the CellML API, which (according to me, at least) is to
>>> provide
>>> easier programmatic access to the contents of CellML documents.
>>>
>> Yes, but I am suggesting this is at a higher level than the RDF
>> level. You might want to check out what they ended up with in the
>> metadata editor code.
>>
I realise a higher level API will be useful for some applications.
However, I don't think that it is sufficient for all CellML processing
applications, and I don't believe that exposing a lower level interface
imposes a significant burden on implementors (on top of the burden
already imposed by the higher level API).

Best regards,
Andrew

[cellml-discussion] Time for a decent RDF library in the CellML API?, Andrew Miller, 09/06/2006
- [cellml-discussion] Time for a decent RDF library in the CellML API?, Matt, 09/06/2006
  - [cellml-discussion] Time for a decent RDF library in the CellML API?, Andrew Miller, 09/06/2006
    - [cellml-discussion] Time for a decent RDF library in the CellML API?, Matt, 09/06/2006
      - [cellml-discussion] Time for a decent RDF library in the CellMLAPI?, David Nickerson, 09/08/2006
        
        [cellml-discussion] Time for a decent RDF library in the CellMLAPI?, Andrew Miller, 09/11/2006
        
        [cellml-discussion] Time for a decent RDF library in the CellMLAPI?, Matt, 09/12/2006
        
        [cellml-discussion] Time for a decent RDF library in the CellMLAPI? - part 2, Matt, 09/12/2006

Archive powered by MHonArc 2.6.18.

Other List Domains [+]

Other List Domains [-]

Text archives Help

[cellml-discussion] Time for a decent RDF library in the CellMLAPI?