Filtering, Querying & Transforming Data
Data querying, filtering and transformation.
Last updated
Was this helpful?
Data querying, filtering and transformation.
Last updated
Was this helpful?
Was this helpful?
Provided that the PDA stores any JSON-formatted data provided to it, it was important for us to design mechanisms for suitable data retrieval. Data bundling in the PDA allows for extremely flexible data transformations and filtering when retrieving data:
Picking specific parts (fields) of interest out of all the data available to avoid exposing data that is not required for the specific application. If you visualise data in a table, this would look like vertically slicing the table.
Filtering only the data that is required, based on values of the data stored. Using the table analogy, this would look like horizontally slicing the table.
Interleaving data from different, potentially heterogeneous endpoints – think about location data coming in from a range of different sources, when an application is only concerned with having the most recent longitude and latitude, no matter which application it has come from.
Restructuring the data to the desired JSON format on the fly, for example to unify the structure of data from different endpoints being interleaved or to reformat to something more convenient for the developer.
The first step in the process is to understand Data Combinators.
The API supports a notion of custom data "combinators", with the key feature being data transformation. It allows for:
remapping data JSON from such different streams into structures chosen by the developer to facilitate consistent structures across unrelated sources
combining data from multiple feeds into a single response stream
ordering of data according to underlying JSON structure fields
filtering of data according to underlying JSON values (including text-based search)
registering a datapoint with a data-mapping specification and GET
ing data from the registered endpoint.
One of the simplest types of data transformation, is the remapping of the data structure. This can be done by creating a combinator:
Request: POST /api/v2.6/combinator/$COMBINATOR_NAME
with header x-auth-token
. Where $COMBINATOR_NAME
is a chosen name for your data combinator. Combinator name can be any valid URL path, but must be unique – request will fail with an error otherwise.
Here's a simple example extracting two fields, longitude
and latitude
from a Rumpel location's endpoint and unwrapping them to a top-level object:
[
{
"endpoint": "rumpel/locations",
"mapping": {
"longitude": "data.locations.longitude",
"latitude": "data.locations.latitude"
}
},
{
"endpoint": "rumpel/profile",
"mapping": {
"firstName": "data.firstName",
"lastName": "data.lastName"
}
}
]
The created combinator can be used by simply sending GET
to /api/v2.6/combinator/$COMBINATOR_NAME
with header x-auth-token
.
It responds with the same data structure as plain data APIs: with a list of data records wrapped with the basic record details and the data itself remapped according to the registered combinator.
[
{
"endpoint": "rumpel/locations",
"recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
"data": {
"longitude": "0.101014673709963",
"latitude": "51.671358277138"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
"data": {
"longitude": "0.100905202634514",
"latitude": "51.674001392439"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
"data": {
"longitude": "0.080477950927866",
"latitude": "51.6658257133844"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
"data": {
"longitude": "0.0641066288762133",
"latitude": "51.6641215101037"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "6a858d87-899e-4961-b722-0738d07c755e",
"data": {
"longitude": "0.0961801595986785",
"latitude": "51.6712232446779"
}
}
]
The combinator's API allows for powerful filtering of data according to the recorded values. The combinator gets created by POST
ing a request to /api/v2.6/combinator/$COMBINATOR_NAME
as previously. However, for each source of data you may also define one or more filters
in addition to the endpoint
and transformation
used to remap the data:
[
{
"endpoint": "rumpel/locations",
"filters": [
{
"field": "data.locations.timestamp",
"transformation": {
"transformation": "datetimeExtract",
"part": "hour"
},
"operator": {
"operator": "between",
"lower": 7,
"upper": 9
}
}
]
}
]
The above example extracts the hour part of the location timestamp and filters for records with the hour between 7 and 9. If you add multiple filters, they act like logical AND
operator: a data record has to match all filters to be included in the result. Every filter
consists of three fields:
Parameter
Type
Meaning
field
String
The JSON path of the field to use for filtering – it can be a simple JSON value, an array or an object.
transformation
Transformation Object
Optionally transforms the field in question before applying a filter. You can find the supported transformations below.
operator
Operator Object
The filtering Operator. You can find the supported operators below.
transformation
– currently supported transformations:
identity
– keep the value as-is, effect is the same as if transformation
was not defined
datetimeExtract
with part
– extract part of a date from an ISO 8601 formatted date field
timestampExtract
with part
– extract part of a date from a UNIX timestamp date field
searchable
– convert the field to searchable text. Must be used together with the find
operator below
operator
– different operator types:
in
together with value
field, set to check if field
is in (is contained by) value
contains
together with value
field, set to check if field
contains value
between
together with lower
and upper
values, checks if the lower
< field
< upper
find
together with search
field set to the search string to perform text-based search on. Must be used together with the searchable
transformation above.
The illustrated ways of creating data combinators hopefully provide you with a comprehensive tool to extract data in any way you like. The next step is to build up a layer of bundles on top of them to allow for retrieving a bigger variety of data in one big bundle.
Data Bundles add a thin layer around combinators
, useful in 2 ways:
Retrieving data into explicitly named properties from different combinators
Accepts orderBy
and limit
parameters to control how many data points are returned for a specific bundle property
Using previously covered examples of profile and location data, they are clearly very distinct, but an application may still benefit from having both at the same time. For instance, it may only care for the most recent information on user's profile and their 5 most recent locations. This can be achieved with a POST
request in https://postman.hubat.net/api/v2.6/data-bundle/localprofile
with header x-auth-token
and body:
{
"profile": {
"endpoints": [
{
"endpoint": "rumpel/profile"
}
],
"limit": 1
},
"location": {
"endpoints": [
{
"endpoint": "rumpel/locations",
"mapping": {
"longitude": "data.locations.longitude",
"latitude": "data.locations.latitude"
}
}
],
"limit": 5
}
}
The response includes the specific data requested:
{
"profile": [
{
"endpoint": "rumpel/profile",
"recordId": "9b136020-372a-4777-81f9-2c4ce6925aea",
"data": {
"profile": {
"website": {
"link": "https://example.com",
"private": "false"
},
"nick": {
"private": "true",
"name": ""
},
"primary_email": {
"value": "[email protected]",
"private": "false"
},
"private": "false",
"youtube": {
"link": "",
"private": "true"
},
"address_global": {
"city": "London",
"county": "",
"country": "UK",
"private": "true"
},
"age": {
"group": "",
"private": "true"
},
"personal": {
"first_name": "",
"private": "false",
"preferred_name": "Test",
"last_name": "User",
"middle_name": "",
"title": ""
},
"blog": {
"link": "",
"private": "false"
},
"facebook": {
"link": "",
"private": "false"
},
"address_details": {
"no": "",
"street": "",
"private": "false",
"postcode": ""
},
"emergency_contact": {
"first_name": "",
"private": "true",
"relationship": "",
"last_name": "",
"mobile": ""
},
"alternative_email": {
"private": "true",
"value": ""
},
"fb_profile_photo": {
"private": "false"
},
"twitter": {
"link": "",
"private": "false"
},
"about": {
"body": "A short bio about me shown on my PHATA",
"private": "false",
"title": "Me the Test User"
},
"mobile": {
"no": "",
"private": "true"
},
"gender": {
"type": "",
"private": "true"
}
}
}
}
],
"location": [
{
"endpoint": "rumpel/locations",
"recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
"data": {
"longitude": "0.101014673709963",
"latitude": "51.671358277138"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
"data": {
"longitude": "0.100905202634514",
"latitude": "51.674001392439"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
"data": {
"longitude": "0.080477950927866",
"latitude": "51.6658257133844"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
"data": {
"longitude": "0.0641066288762133",
"latitude": "51.6641215101037"
}
},
{
"endpoint": "rumpel/locations",
"recordId": "6a858d87-899e-4961-b722-0738d07c755e",
"data": {
"longitude": "0.0961801595986785",
"latitude": "51.6712232446779"
}
}
]
}
To keep the example simple, it does not include complex data combinators
covered in the previous step. However you will notice that the endpoints
property has exactly the same format as the body of a request for creating a new combinator
.
Like Data Combinators, Data Bundles can only be directly used by privileged applications such as the personal data dashboard. However this leads us to Data Debits for consented data sharing as Bundles is the format used to specify the data requested from the user.