Built into the PyWrapper software is the BioCASE Protocol, which defines the ways of querying the wrapper and how an answer is delivered. This hard-coded protocol is responsible for a successful communication between the Provider Software and a client application. It doesn’t know anything about the “real” biological or whatever data that is queried and transmitted. It defines certain activities or methods that are understood by the wrapper and the client applications.
The protocol is based on the DiGIR protocol, but incorporates some BioCASE specific changes that unfortunately makes the two incompatible. There is a recent new development, the TDWG Access Protocol for Information Retrieval (TAPIR), that tries to bring the two back together.
The following clickable diagram shows the central role of the protocol and how the main components interact.
Because the crucial task of the PyWrapper software is to deliver data in a predefined standard format, this format needs to be defined somewhere. In this document it is called the data-bearing “concept schema”. The PyWrapper was designed to handle any kind of schema, as long as there exists a configuration file for every database and for every schema that should be supported. Those configuration files are called concept mapping files or in short CMF, because they map database attributes to concepts defined within an XML concept schema.
As just mentioned there may be many different CMFs, but for the BioCASE project the ABCD schema is relevant and the only required CMF. As a provider you can feel to add additional CMFs, for example to support the Darwin Core standard.
The protocol defines several different activities in the <type> element of its header section:
<?xml version="1.0" encoding="UTF-8" ?> <request xmlns="http://www.biocase.org/schemas/protocol/1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.biocase.org/schemas/protocol/1.3 http://www.bgbm.org/biodivinf/Schema/protocol_1_3.xsd"> <header> <version>0.98</version> <sendTime>2002-09-11T09:30:47-05:00</sendTime> <source>192.168.1.105</source> <destination>http://www.collection.org/myCollection.py</destination> <type>search</type> </header> ... </request>Up to now there are three methods defined: a search, a scan and a capabilities request:
This is the main method. It defines how to search a providers database and essentially wraps simple SQL SELECT statements into a XML format (defined by the protocol of course). Right now the select part of the statement is fixed and all matching data is returned.
<request> <header> ... <type>search</type> </header> <search> <requestFormat>http://www.tdwg.org/schemas/abcd/1.2</requestFormat> <responseFormat start="0" limit="50">http://www.tdwg.org/schemas/abcd/1.2</responseFormat> <filter> <like path="/DataSets/DataSet/Units/Unit/Identifications/Identification/ TaxonIdentified/NameAuthorYearString">Ast*</like> </filter> <count>false</count> </search> </request>
The concept schema the search filter is based on is defined by the element <requestFormat>. The concept schema to be used to return the result is defined by <responseFormat> and does not have to be the same as the requested format. For example you can query using the Darwin core standard but return a full ABCD document.
The search is stateful. So it is possible to limit the number of returned records and to define a record number to start from.
The <filter> wraps the WHERE clause of an SQL statement. It is a nested structure using different combinations of logical and comparison operators. Following operators are supported (XML is case sensitive!):
binary comparison operators
unary comparison operators
comparison operators for multiple arguments
unary logical operators
binary logical operators
Here is a more complex filter example illustrating several conditions based on ABCD:
<filter> <and> <like path="/DataSets/DataSet/Units/Unit/Identifications/Identification/ TaxonIdentified/NameAuthorYearString">Abies*</like> <or> <like path="/DataSets/DataSet/Units/Unit/Identifications/Identification/ TaxonIdentified/HigherTaxa/HigherTaxon">Pinace*</like> <and> <like path="/DataSets/DataSet/Units/Unit/Gathering/GatheringSite/Country/ CountryName">*Russia*</like> <greaterThan path="/DataSets/DataSet/Units/Unit/Gathering/GatheringDateTime/ ISODateTimeBegin">2002-04</greaterThan> </and> </or> </and> </filter>
The response will deliver an XML document based on the protocol which envelopes an XML document based on the concept schema containing the records (see diagram above). It also gives information about the number of returned records, the number of dropped records (that is records that matched your request filter, but did contain sufficient data to produce a valid XML document based on the concept schema) and the number of records in the database that matched the query.
A scan request concentrates on one concept referenced by an xpath to the element of the respective concept schema. It is essentially a SELECT DISTINCT in SQL and returns all unique values for this concept. Here is the abbreviated scan request example:
<header> ... <type>scan</type> </header> <scan> <requestFormat>http://www.tdwg.org/schemas/abcd/1.2</requestFormat> <concept>/DataSets/DataSet/Units/Unit/Identifications/Identification/ TaxonIdentified/ScientificNameAtomized/Botanical/Genus</concept> </scan>
Take a look at the examples to see a scan response document.
A capabilities request allows a client to get information about which concepts are mapped (defined) in a provider database. This request type returns a list of xpaths identifying all mapped concepts.
<request> <header> ... <type>capabilities</type> </header> </request>
Take a look at the examples to see a capabilities response document.
Download the latest and older versions of the BioCASE protocol schema.