CellML Discussion List

Text archives Help


[cellml-discussion] Time for a decent RDF library in the CellMLAPI?


Chronological Thread 
  • From: matt.halstead at auckland.ac.nz (Matt)
  • Subject: [cellml-discussion] Time for a decent RDF library in the CellMLAPI?
  • Date: Tue, 12 Sep 2006 01:52:17 +1200


On 11/09/2006, at 4:02 PM, Andrew Miller wrote:

> David Nickerson wrote:
>> If I understand the discussion so far, I agree with Matt and we
>> should
>> be looking at a higher level API than providing methods to manipulate
>> the RDF directly.
>>
> Would you be opposed to offering both a generic API and specific ones?
> We currently allow for extension elements to be manipulated, and
> RDF is
> a much better representation for data outside of our existing
> specifications (because it allows for arbitrary additional information
> to be added to any resource in the model, can tie in information about
> elements from distinct parts of the model, allows for reification to
> annotate an existing arc in a standard way, and works well for meta-
> data
> which ties together information about more than well CellML
> document, or
> which is defined as a model supplement in an external document). By
> supporting extension elements, but not supporting extension RDF,
> however, we would encourage people to use plain XML instead of RDF to
> represent types of data we don't know how to represent.

I can't quite see the argument here. I would not want to encourage
people to add RDF data where they see CellML fails, as opposed to add
RDF data where they see a need to complement the CellML model with
data that could not be argued to be better supported by extending
CellML. I also don't see that not having defined an RDF API and
implementing it would make more people resort to plain XML. I would
presume most of the extensions would be well structured, and as such
have an RDF-Schema. I don't see that the RDF-Schema model would
differ significantly from any plain XML based model. The decision
would be the developers understanding of RDF and what they may benfit
or not from using it. Personally, if I had the chance, CellML would
be redesigned to be RDFS based. But that's another discussion.

You do make some statements about RDF features. But I don't see this
pointing to a need to supply an API and implementing it.


>
>> I also think that while such an API will be very useful we need to
>> ensure that this discussion doesn't hold up the initial release of
>> PCEnv.
>>
> PCEnv needs to manipulate RDF, so having a good solution here would be
> beneficial. As a temporary measure, I have written application-side
> code
> which uses the DOM API to create a new RDF/XML document containing the
> contents of all rdf elements. This document is then serialised and put
> into the Mozilla RDF library. The hassle then comes when we want to
> change RDF, because then the RDF/XML has to be serialised back out of
> Mozilla, then the DOM has to be used to delete all RDF:rdf elements at
> the API side, and then the RDF/XML has to be re-parsed at the API
> side,
> and put as a single block as a child of the model element.
>
> Contrast this with a simple get or change operation.

But this is just levels of abstraction, and not part of an RDF API.
It's part of a CellML API that handles the getting and setting of RDF
elements. The way you implement it is one such method; to users of
the CellML API, what would they care? They can ask for the RDF
element and they can set it. This is not an RDF API though. Also, if
there was a higher level API based on the schemas of the RDF we
currently accept (and including any further restrictions we make)
then why would you be bothering to even think about the RDF, all you
want is the cemta:id and the things you want to say.

Also; I'm not yet convinced that RDF elements need to be preserved in
their original placements. Their position within the document is
irrelevant to their interpretation.

>
> Of course, if we must do this through serialised RDF, I think the
> current approach (which is to look from a child of an RDF:rdf element
> child of the model, with an RDF:about attribute equal to the
> cmeta:id of
> the element for which we request the RDF) is completely broken, and I
> would like to drop this from the API,

You're treating RDF as XML? My approach is to load all triples from
all RDF elements in a document into a single model graph (doesn't
need a reformation into a single RDF/XML document), and then do a
query (in whatever query system is available in the library) to get
all triples associated with a particular cmeta:id (as subject mainly)

> in favour of an operation on the
> model, which returns a single complete RDF/XML document containing all
> of the RDF from the CellML model (with no filtering attempted). We
> would
> also need an operation to set the RDF, which will strip all RDF out of
> the model, and add the result of parsing the serialised input
> string as
> a single RDF:rdf child of the model. If we are not going to add
> full RDF
> support to the CellML API, then this approach is better than the
> current
> approach, because:
>
> 1) it can be implemented properly for all RDF/XML in the model,
> without
> making any artificial assumptions about the way the RDF/XML is
> structured, and without putting a full RDF/XML parser into the CellML
> DOM API.
>
> 2) if we don't provide even basic RDF facilities, the only
> practical way
> for the application to process the RDF (other than by assuming a
> certain
> serialisation) is to provide a generic RDF/XML parser. If the
> application is expected to do this, it might as well have access to
> the
> entire RDF graph for the model, instead of just a fragment which,
> based
> on the way the RDF/XML happened to be expressed before, was a child in
> that element in the RDF tree.
>

This doesn't imply it needs to come from a single RDF/XML uber-document.

> 3) ontology software, and other applications which need to create an
> RDF graph containing RDF spanning multiple models can simply go
> through
> a list of models, ask the models for their RDF/XML, and aggregate all
> the RDF documents into a single graph, rather than having to ask
> the API
> for every single variable / component in each model.
>

I don't follow the variable component bit; but yes, to be able to ask
for a container of RDF elements, or one bug uber-document could make
things simpler for a lot of people. At present, the container of RDF
elements is just an XPATH or getElementByTagNameNS away.


> Therefore, I think we need to support several use cases:
> 1) The application has its own RDF library, and wants to do all
> sorts of
> complex queries on it (perhaps using a query language), or wants to
> aggregate graphs across multiple models. In this case, the RDF/XML
> serialisation approach is probably best.

This would be the typical searching use-case, and will seldom want to
add new data.

> 2) The application only wants to use standards from the cmeta,
> simulation, and / or graph specifications. In this case, a higher
> level
> API should be available to them.

Yep.

> 3) The application wants to access RDF data defined by a newer
> specification, or by a specification which is not used commonly enough
> to warrant inclusion in the CellML API (there is an enormous number of
> things which users might want to annotate about a model, depending on
> what type of research they are doing and so on, many of which will be
> specific to a particular field, and we cannot possibly contemplate
> them
> all). However, they are happy with a fairly simple RDF API.

I agree with the former part of this, and which 1) would suffice. But
the statement "however, they are happy with a fairly simple RDF API"
has absolutely no basis. I understand that perhaps a simple one may
satisfy your current requirements, but I don't see any evidence that
anyone else needs or wants one. I'm more than happy choosing an RDF
library based on specific features, such as querying or storage
performance, when addressing different needs. Sure, each offer some
basic RDF operations on triples, but these usually serve as
additional filters, or for populating specific data structures once I
have found them and confirmed they have the sub-graph structure I
thought they had.

One of our largest problems with metadata will be (since we are the
only ones really using it at the moment) making sure people conform
to the schemas and rules we supply in the cmeta specification for the
particular metadata attributes we specify them for. We should be
encouraging people to use these well instead of adding all sorts of
home-grown flavours of annotation. Just because RDF can say anything,
it doesn't mean we can easily interpret it back into meaningful data
structures in our applications. Our models are supposed to be
transportable to other environments; if these environments know what
RDF-schemas and rules to support, then we don't have a problem. A
high level API would address this.

cheers
Matt



>
>> Andre.
>>
>> Matt wrote:
>>
>>> On 6/09/2006, at 4:53 PM, Andrew Miller wrote:
>>>
>>>
>>>> Matt wrote:
>>>>
>>>>> The fact there is no standardized API does not mean we invent our
>>>>> own. There are plenty of RDF implementations around and a huge
>>>>> amount
>>>>> of overlap between them. I suggest we find that subset that shows
>>>>> reasonable intersection over the most popular rdf libraries and
>>>>> use
>>>>> that.
>>>>>
>>>> I think you will find that my proposal meets these criteria,
>>>> because I
>>>> have specified all the very basic RDF operations (as well as some
>>>> necessarily CellML specific ones).
>>>>
>>>>
>>> Yep, I agree that it is a reasonable set. I'd be surprised if any
>>> useful RDF library does not implement them. I don't see why we
>>> need to.
>>>
> Because lots of applications (website, editors, and so on) don't need
> complex query languages, so they don't need any additional RDF
> library.
> It therefore makes sense to use the RDF library at the API side,
> rather
> than burdening applications which this responsibility. Therefore, the
> RDF support in the CellML API will be 'useful' for some, but not all,
> applications.
>
> For applications where this is not sufficient to be considered useful,
> the RDF/XML serialisation approach would be a better option.
>>>
>>>
>>>> Also note that the design of the CellML API means that methods for
>>>> accessing RDF are identified by URIs,
>>>>
>>> What do you mean by this?
>>>
> From CellML_APISPEC.idl:
>
> /**
> * The RDF metadata associated with this element. An element
> must have a
> * cmeta:id for any RDF to be able to refer to it.
> * @param type The URN describing the type of RDF metadata.
> Implementations
> * are free to add new types by creating new type
> names
> at URNs
> * under their jurisdiction. New URNs under
> http://www.cellml.org
> * are reserved for future versions of this
> specification.
> * @return The object containing the RDF representation. If no
> arcs are
> * defined, an empty RDF representation is returned. The
> object may
> * be cast in an application defined manner depending
> on the
> type
> * returned.
> * @raises CellMLException if type isn't supported.
> * All implementations must implement the following types:
> * http://www.cellml.org/RDFXML/string
> * http://www.cellml.org/RDFXML/DOM
> */
> RDFRepresentation getRDFRepresentation(in wstring type)
> raises(CellMLException);
>
> Note however, that I am proposing moving this from the elements to the
> model, so that RDFRepresentation becomes a representation of the
> entire
> RDF graph, and the addition of some new (mandatory?) types which are
> more useful.
>
>>>
>>>> so you can have more than one
>>>> (although we wouldn't want to burden this on implementors, so they
>>>> would
>>>> have to be optional. We could have a core, required API, and allow
>>>> make
>>>> better RDF specifications, e.g. providing query language access,
>>>> documented but not required).
>>>>
>>>>> But in saying that, I'm not sure you need to be exposing the
>>>>> RDF through an RDF centric API. The developers of the metadata
>>>>> editor
>>>>> found it more useful to offer an API that was centered around the
>>>>> kinds of metadata that needed to be supplied - for instance to
>>>>> add a
>>>>> series of authors, it was much nicer to be able to populate an
>>>>> authors data structure, especially since in the cellml metadata
>>>>> specification there is a strict interpretation of the
>>>>> underlying RDF
>>>>> data structures - such as bags, lists etc.
>>>>>
>>>>>
>>>> It is certainly worthwhile to offer convenience interfaces
>>>> specific to
>>>> certain specifications, such as cmeta, the simulation and graph
>>>> specifications.
>>>>
>>> I see these as been the current use cases, and the most important
>>> level at which to address any specific API (not the RDF level).
>>>
> Why can't we have both a specific and general API, if both are useful?
>
>>>
>>>
>>>> However, the problem with this is that there is such a
>>>> large (and continuously growing) set of RDF-based metadata that
>>>> people
>>>> might want to use, and so they need to be able to access this
>>>> without
>>>> updating the CellML API to support every specification ever
>>>> invented.
>>>>
>>> RDF can always be processed by RDF libraries so long as the RDF/XML
>>> fragments are available to load. The lesson learnt with the cellml
>>> metadata editor was that the higher level API that addressed the
>>> necessary and optional but useful metadata requirements were the
>>> most
>>> relevant interfaces. Our metadata specification is quite strict
>>> about
>>> the relationship of various RDF schemas for particular annotation
>>> purposes, e.g. the combination of bqs:Person and vcard structures.
>>> I think for the annotation structures that we say are necessary or
>>> useful, that a predefined specification and interface is very
>>> useful,
>>> especially for people trying to populate specific data structures
>>> out
>>> of them. If for example we said feel free to use anything inside
>>> bqs:reference, or actually any arbitrary reference schema, then we
>>> run into an increasing number of permutations that one would need to
>>> accommodate in RDF queries or RDF subgraph graph accessors to get at
>>> the same information.
>>>
> I am not saying that people should represent the same information by
> more than one RDF graph, but rather, people should be able to add new
> information which cannot currently be represented in any supported
> specifications. Our specifications do not contemplate every type of
> data
> that people might find useful (for example, what if I wanted to
> represent detailed, cardiac electrophysiology specific information
> about
> a model? I would firstly look to see if anyone else has made an
> existing
> specification capable of holding the information. If not, I could then
> write a specification, and access it using a generic API).
>>> RDF does not itself imply anything goes; I feel energy is better
>>> spent specifying a strict RDF schema and an API that satisfies
>>> interacting with data that conforms to this (and not at the triple
>>> level).
>>>
> RDF is supposed to be an open-world framework. See
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-anyone:
>
>
> "2.2.6 Anyone Can Make Statements About Any Resource
>
> To facilitate operation at Internet scale, RDF is an open-world
> framework that allows anyone to make statements about any resource.
>
> In general, it is not assumed that complete information about any
> resource is available. RDF does not prevent anyone from making
> assertions that are nonsensical or inconsistent with other statements,
> or the world as people see it. Designers of applications that use RDF
> should be aware of this and may design their applications to tolerate
> incomplete or inconsistent sources of information."
>
> That said, I agree that it is very bad practice to invent a new
> specification when there is an existing one capable of representing
> the
> desired information (and if the existing one only represents part
> of the
> information, it is better to extend it by adding new arcs, rather than
> replacing it). However, this does not mean that we should not provide
> the capability to use new types of information which we never
> contemplated.
>
>>>> Providing an RDF API at the CellML API side is very useful,
>>>> especially
>>>> when there are multiple consumers of the implementation, because
>>>> you are
>>>> working with the real document, rather than a copy which was
>>>> created at
>>>> some earlier point.
>>>>
>>>>
>>> I think you are assuming too much about the underlying framework
>>> here. Perhaps I am wrong, but are referring to the shared model
>>> through the corba interface?
>>>
> Our API interface is the same whether or not we go through CORBA,
> so my
> comments will apply any time we are using the API. I am assuming that
> the application only accesses the objects obtained from the API
> implementation through the interface, and doesn't go poking around in
> API implementation private memory (obviously, if you are using
> CORBA to
> access cross-process or cross-machine API implementations, the
> operating
> system enforces this, but even if the API resides in the same address
> space, it is incredibly bad practice to break this assumption).
>
> If our API forces us to serialise and parse to work with RDF, we are
> invariably copying data, regardless of how we are using the CellML
> API.
> Of course, once we have made this copy, it may be easier in some use
> cases than in others to pass the copy around (but if you have
> created a
> way to pass RDF associated with CellML around internally, you have
> essentially created an informal API, so why not make it official
> rather
> than rewriting it in every application?).
>
>>>
>>>
>>>> As I have pointed out, if you try to get serialised RDF/XML out
>>>> of an
>>>> RDF/XML unaware implementation, you run into all sorts of problems
>>>> with
>>>> getting all the data.
>>>>
>>>> For example, I just had to write code like this in PCEnv:
>>>> function getModelMetadata(model)
>>>> {
>>>> var el = model.getRDFRepresentation("http://www.cellml.org/
>>>> RDFXML/
>>>> DOM").
>>>> QueryInterface(Components.interfaces.
>>>> cellml_api_IRDFXMLDOMRepresentation).data;
>>>> var od = el.ownerDocument;
>>>> var rnl =
>>>> od.getElementsByTagNameNS("http://www.w3.org/1999/02/22-rdf-syntax-
>>>> ns#",
>>>> "RDF");
>>>> var l = rnl.length;
>>>> var i;
>>>> var td = od.implementation.createDocument(
>>>> "http://www.w3.org/1999/02/22-rdf-syntax-ns#";, "rdf:RDF",
>>>> od.doctype);
>>>> var de = td.documentElement;
>>>> for (i = 0; i < l; i++)
>>>> {
>>>> de.appendChild(td.importNode(rnl.item(i), true));
>>>> }
>>>> var rrs = window.context.cellmlBootstrap.serialiseNode(td);
>>>>
>>>> // Put it into the Mozilla RDF implementation...
>>>> var p = Components.classes["@mozilla.org/rdf/xml-parser;1"].
>>>> createInstance(Components.interfaces.nsIRDFXMLParser);
>>>> var mds =
>>>> Components.classes["@mozilla.org/rdf/datasource;1?name=in-memory-
>>>> datasource"].
>>>> createInstance(Components.interfaces.nsIRDFDataSource);
>>>> var modelURI = model.base_uri.asText;
>>>> var uri = Components.classes["@mozilla.org/network/standard-
>>>> url;1"].
>>>> createInstance(Components.interfaces.nsIURI);
>>>> uri.spec = modelURI;
>>>> p.parseString(mds, uri, rrs);
>>>> return mds;
>>>> }
>>>>
>>>> This is bad for several reasons:
>>>> 1) I have to do a lot of work just to support a relatively common
>>>> operation (getting the metadata) properly.
>>>> 2) It requires a lot of communication between the CellML API and
>>>> the user.
>>>> 3) It uses the DOM core to traverse through nodes defined in
>>>> CellML, in
>>>> order to find all the RDF. The CellML API was designed to prevent
>>>> this,
>>>> so this is a violation of the design principles underlying the
>>>> CellML API.
>>>> 4) It makes a copy of the RDF from the model at the Mozilla side,
>>>> which
>>>> could potentially get out of sync.
>>>> 5) Trying to change the model requires even more special logic
>>>> (e.g.
>>>> would have to write code to explicitly strip out all the rdf:RDF
>>>> elements, serialise the RDF into a document at the Mozilla-side,
>>>> send it
>>>> across to the API side as a string and parse into a document, then
>>>> import the new document element into the model document, and
>>>> append to
>>>> the model document element).
>>>>
>>>>
>>> I tend to use an XPATH query and copy the fragments into a new
>>> document. I don't find this particularly hard, and for any given
>>> implementation of the CellML API, it's just a single call away for
>>> the user of that API.
>>>
> You then need to get access to the CellML model with something which
> supports XPath. The whole point of the CellML API is to prevent the
> need
> for direct access to the DOM representation. Since DOM doesn't support
> XPath directly, you would need to implement something which
> supported it
> but didn't access data except through the DOM (or put the XPath
> API-side, so it was allowed to poke into the internals of the
> implementation). While this sounds like a common thing that there
> should
> be code for, in practice I have found that everyone invents their own
> mapping from the W3C specification to their language of choice
> (ourselves included) rather than strictly following a mapping such as
> the CORBA mapping (the only exception being in Javascript, where
> everyone uses a fairly consistent IDL => Javascript mapping).
>
> I think it would be far wiser to implement proper RDF support than to
> implement XPath so we can write a better hack to allow us to access
> the
> RDF. If we don't do this, then even putting this logic onto Model
> instead would beat implementing XPath in terms of usability.
>
>>>
>>>
>>>>> There is nothing stopping anyone adding arbitrary RDF using
>>>>> whatever
>>>>> RDF tool they want.
>>>>>
>>>>>
>>>> Except that RDF/XML is not a nice way to work with RDF, and as I
>>>> showed
>>>> above, serialise/parse creates problems (I could use your same
>>>> argument
>>>> to say that we should work on CellML documents from directly
>>>> from the
>>>> DOM core API, but that doesn't mean that it would be productive).
>>>>
>>> I wasn't saying that we use RDF/XML to work with RDF. I am saying
>>> anyone adding arbitrary RDF to a CellML model is free to use
>>> whatever
>>> RDF library implementation they want to access it. They may even
>>> have
>>> their own schema aware libraries - well, I'd hope so.
>>>
> I'm not saying we should block this use case, but it seems silly to
> make
> everyone include an RDF parser for common functionality (even if they
> can do it by including a library), especially if that means going
> through an extra serialise => parse => serialise just to make a change
> to the model.
>
>>>
>>>>> Specifying and implementing our own RDF API does not make sense
>>>>> to me
>>>>> at all.
>>>>>
>>>>>
>>>> It makes a lot of sense to me, because it is consistent with the
>>>> main
>>>> goal of the CellML API, which (according to me, at least) is to
>>>> provide
>>>> easier programmatic access to the contents of CellML documents.
>>>>
>>> Yes, but I am suggesting this is at a higher level than the RDF
>>> level. You might want to check out what they ended up with in the
>>> metadata editor code.
>>>
> I realise a higher level API will be useful for some applications.
> However, I don't think that it is sufficient for all CellML processing
> applications, and I don't believe that exposing a lower level
> interface
> imposes a significant burden on implementors (on top of the burden
> already imposed by the higher level API).
>
> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion at cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion





Archive powered by MHonArc 2.6.18.

Top of page