Filtering, Querying & Transforming Data
Data querying, filtering and transformation.
Data Bundling
Provided that the PDA stores any JSON-formatted data provided to it, it was important for us to design mechanisms for suitable data retrieval. Data bundling in the PDA allows for extremely flexible data transformations and filtering when retrieving data:
Picking specific parts (fields) of interest out of all the data available to avoid exposing data that is not required for the specific application. If you visualise data in a table, this would look like vertically slicing the table.
Filtering only the data that is required, based on values of the data stored. Using the table analogy, this would look like horizontally slicing the table.
Interleaving data from different, potentially heterogeneous endpoints – think about location data coming in from a range of different sources, when an application is only concerned with having the most recent longitude and latitude, no matter which application it has come from.
Restructuring the data to the desired JSON format on the fly, for example to unify the structure of data from different endpoints being interleaved or to reformat to something more convenient for the developer.
The first step in the process is to understand Data Combinators.
Data Combinators
The API supports a notion of custom data "combinators", with the key feature being data transformation. It allows for:
remapping data JSON from such different streams into structures chosen by the developer to facilitate consistent structures across unrelated sources
combining data from multiple feeds into a single response stream
ordering of data according to underlying JSON structure fields
filtering of data according to underlying JSON values (including text-based search)
registering a datapoint with a data-mapping specification and
GET
ing data from the registered endpoint.
Creating a simple combinator
One of the simplest types of data transformation, is the remapping of the data structure. This can be done by creating a combinator:
Request: POST /api/v2.6/combinator/$COMBINATOR_NAME
with header x-auth-token
. Where $COMBINATOR_NAME
is a chosen name for your data combinator. Combinator name can be any valid URL path, but must be unique – request will fail with an error otherwise.
Here's a simple example extracting two fields, longitude
and latitude
from a Rumpel location's endpoint and unwrapping them to a top-level object:
Fetching data from a Data Combinator
The created combinator can be used by simply sending GET
to /api/v2.6/combinator/$COMBINATOR_NAME
with header x-auth-token
.
It responds with the same data structure as plain data APIs: with a list of data records wrapped with the basic record details and the data itself remapped according to the registered combinator.
Data Filtering
The combinator's API allows for powerful filtering of data according to the recorded values. The combinator gets created by POST
ing a request to /api/v2.6/combinator/$COMBINATOR_NAME
as previously. However, for each source of data you may also define one or more filters
in addition to the endpoint
and transformation
used to remap the data:
The above example extracts the hour part of the location timestamp and filters for records with the hour between 7 and 9. If you add multiple filters, they act like logical AND
operator: a data record has to match all filters to be included in the result. Every filter
consists of three fields:
Parameter
Type
Meaning
field
String
The JSON path of the field to use for filtering – it can be a simple JSON value, an array or an object.
transformation
Transformation Object
Optionally transforms the field in question before applying a filter. You can find the supported transformations below.
operator
Operator Object
The filtering Operator. You can find the supported operators below.
transformation
– currently supported transformations:identity
– keep the value as-is, effect is the same as iftransformation
was not defineddatetimeExtract
withpart
– extract part of a date from an ISO 8601 formatted date fieldtimestampExtract
withpart
– extract part of a date from a UNIX timestamp date fieldsearchable
– convert the field to searchable text. Must be used together with thefind
operator below
operator
– different operator types:in
together withvalue
field, set to check iffield
is in (is contained by)value
contains
together withvalue
field, set to check iffield
containsvalue
between
together withlower
andupper
values, checks if thelower
<field
<upper
find
together withsearch
field set to the search string to perform text-based search on. Must be used together with thesearchable
transformation above.
The illustrated ways of creating data combinators hopefully provide you with a comprehensive tool to extract data in any way you like. The next step is to build up a layer of bundles on top of them to allow for retrieving a bigger variety of data in one big bundle.
Data Bundles
Data Bundles add a thin layer around combinators
, useful in 2 ways:
Retrieving data into explicitly named properties from different
combinators
Accepts
orderBy
andlimit
parameters to control how many data points are returned for a specific bundle property
Using previously covered examples of profile and location data, they are clearly very distinct, but an application may still benefit from having both at the same time. For instance, it may only care for the most recent information on user's profile and their 5 most recent locations. This can be achieved with a POST
request in https://postman.hubat.net/api/v2.6/data-bundle/localprofile
with header x-auth-token
and body:
The response includes the specific data requested:
To keep the example simple, it does not include complex data combinators
covered in the previous step. However you will notice that the endpoints
property has exactly the same format as the body of a request for creating a new combinator
.
Like Data Combinators, Data Bundles can only be directly used by privileged applications such as the personal data dashboard. However this leads us to Data Debits for consented data sharing as Bundles is the format used to specify the data requested from the user.
Last updated