# Filtering, Querying & Transforming Data

## Data Bundling

Provided that the PDA stores any JSON-formatted data provided to it, it was important for us to design mechanisms for suitable data retrieval. Data bundling in the PDA allows for extremely flexible data transformations and filtering when retrieving data:

* Picking specific parts (fields) of interest out of all the data available to avoid exposing data that is not required for the specific application. If you visualise data in a table, this would look like vertically slicing the table.
* Filtering only the data that is required, based on values of the data stored. Using the table analogy, this would look like horizontally slicing the table.
* Interleaving data from different, potentially heterogeneous endpoints – think about location data coming in from a range of different sources, when an application is only concerned with having the most recent longitude and latitude, no matter which application it has come from.
* Restructuring the data to the desired JSON format on the fly, for example to unify the structure of data from different endpoints being interleaved or to reformat to something more convenient for the developer.

The first step in the process is to understand Data Combinators.

### Data Combinators

The API supports a notion of custom data "combinators", with the key feature being data transformation. It allows for:

* remapping data JSON from such different streams into structures chosen by the developer to facilitate consistent structures across unrelated sources
* combining data from multiple feeds into a single response stream
* ordering of data according to underlying JSON structure fields
* filtering of data according to underlying JSON values (including text-based search)
* registering a datapoint with a data-mapping specification and `GET`ing data from the registered endpoint.

### Creating a simple combinator

One of the simplest types of data transformation, is the remapping of the data structure. This can be done by creating a *combinator*:

Request: `POST /api/v2.6/combinator/$COMBINATOR_NAME` with header `x-auth-token`. Where `$COMBINATOR_NAME` is a chosen name for your data combinator. Combinator name can be any valid URL path, but must be unique – request will fail with an error otherwise.

Here's a simple example extracting two fields, `longitude` and `latitude` from a [Rumpel](https://github.com/Hub-of-all-Things/Rumpel) location's endpoint and unwrapping them to a top-level object:

```javascript
[
    {
        "endpoint": "rumpel/locations",
        "mapping": {
            "longitude": "data.locations.longitude",
            "latitude": "data.locations.latitude"
        }
    },
    {
        "endpoint": "rumpel/profile",
        "mapping": {
            "firstName": "data.firstName",
            "lastName": "data.lastName"
        }
    }
]
```

### Fetching data from a Data Combinator

The created combinator can be used by simply sending `GET` to `/api/v2.6/combinator/$COMBINATOR_NAME` with header `x-auth-token`.

It responds with the same data structure as plain data APIs: with a list of data records wrapped with the basic record details and the data itself remapped according to the registered combinator.

```javascript
[
  {
    "endpoint": "rumpel/locations",
    "recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
    "data": {
      "longitude": "0.101014673709963",
      "latitude": "51.671358277138"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
    "data": {
      "longitude": "0.100905202634514",
      "latitude": "51.674001392439"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
    "data": {
      "longitude": "0.080477950927866",
      "latitude": "51.6658257133844"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
    "data": {
      "longitude": "0.0641066288762133",
      "latitude": "51.6641215101037"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "6a858d87-899e-4961-b722-0738d07c755e",
    "data": {
      "longitude": "0.0961801595986785",
      "latitude": "51.6712232446779"
    }
  }
]
```

### Data Filtering

The combinator's API allows for powerful filtering of data according to the recorded values. The combinator gets created by `POST`ing a request to `/api/v2.6/combinator/$COMBINATOR_NAME` as previously. However, for each source of data you may also define one or more `filters` in addition to the `endpoint` and `transformation` used to remap the data:

```javascript
[
  {
    "endpoint": "rumpel/locations",
    "filters": [
      {
        "field": "data.locations.timestamp",
        "transformation": {
          "transformation": "datetimeExtract",
          "part": "hour"
        },
        "operator": {
          "operator": "between",
          "lower": 7,
          "upper": 9
        }
      }
    ]
  }
]
```

The above example extracts the hour part of the location timestamp and filters for records with the hour between 7 and 9. If you add multiple filters, they act like logical `AND` operator: a data record has to match all filters to be included in the result. Every `filter` consists of three fields:

| Parameter      | Type                  | Meaning                                                                                                                 |
| -------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| field          | String                | The JSON path of the field to use for filtering – it can be a simple JSON value, an array or an object.                 |
| transformation | Transformation Object | Optionally transforms the field in question before applying a filter. You can find the supported transformations below. |
| operator       | Operator Object       | The filtering Operator. You can find the supported operators below.                                                     |

* `transformation` – currently supported transformations:
  * `identity` – keep the value as-is, effect is the same as if `transformation` was not defined
  * `datetimeExtract` with `part` – extract part of a date from an ISO 8601 formatted date field
  * `timestampExtract` with `part` – extract part of a date from a UNIX timestamp date field
  * `searchable` – convert the field to searchable text. Must be used together with the `find` operator below
* `operator` – different operator types:
  * `in` together with `value` field, set to check if `field` is in (is contained by) `value`
  * `contains` together with `value` field, set to check if `field` contains `value`
  * `between` together with `lower` and `upper` values, checks if the `lower` < `field` < `upper`
  * `find` together with `search` field set to the search string to perform text-based search on. Must be used together with the `searchable` transformation above.

The illustrated ways of creating data combinators hopefully provide you with a comprehensive tool to extract data in any way you like. The next step is to build up a layer of bundles on top of them to allow for retrieving a bigger variety of data in one big bundle.

### Data Bundles

Data Bundles add a thin layer around `combinators`, useful in 2 ways:

1. Retrieving data into explicitly named properties from different `combinators`
2. Accepts `orderBy` and `limit` parameters to control how many data points are returned for a specific bundle property

Using previously covered examples of profile and location data, they are clearly very distinct, but an application may still benefit from having both at the same time. For instance, it may only care for the most recent information on user's profile and their 5 most recent locations. This can be achieved with a `POST` request in `https://postman.hubat.net/api/v2.6/data-bundle/localprofile` with header `x-auth-token` and body:

```javascript
{
  "profile": {
    "endpoints": [
      {
        "endpoint": "rumpel/profile"
      }
    ],
    "limit": 1
  },
  "location": {
    "endpoints": [
      {
        "endpoint": "rumpel/locations",
        "mapping": {
          "longitude": "data.locations.longitude",
          "latitude": "data.locations.latitude"
        }
      }
    ],
    "limit": 5
  }
}
```

The response includes the specific data requested:

```javascript
{
  "profile": [
    {
      "endpoint": "rumpel/profile",
      "recordId": "9b136020-372a-4777-81f9-2c4ce6925aea",
      "data": {
        "profile": {
          "website": {
            "link": "https://example.com",
            "private": "false"
          },
          "nick": {
            "private": "true",
            "name": ""
          },
          "primary_email": {
            "value": "testuser@example.com",
            "private": "false"
          },
          "private": "false",
          "youtube": {
            "link": "",
            "private": "true"
          },
          "address_global": {
            "city": "London",
            "county": "",
            "country": "UK",
            "private": "true"
          },
          "age": {
            "group": "",
            "private": "true"
          },
          "personal": {
            "first_name": "",
            "private": "false",
            "preferred_name": "Test",
            "last_name": "User",
            "middle_name": "",
            "title": ""
          },
          "blog": {
            "link": "",
            "private": "false"
          },
          "facebook": {
            "link": "",
            "private": "false"
          },
          "address_details": {
            "no": "",
            "street": "",
            "private": "false",
            "postcode": ""
          },
          "emergency_contact": {
            "first_name": "",
            "private": "true",
            "relationship": "",
            "last_name": "",
            "mobile": ""
          },
          "alternative_email": {
            "private": "true",
            "value": ""
          },
          "fb_profile_photo": {
            "private": "false"
          },
          "twitter": {
            "link": "",
            "private": "false"
          },
          "about": {
            "body": "A short bio about me shown on my PHATA",
            "private": "false",
            "title": "Me the Test User"
          },
          "mobile": {
            "no": "",
            "private": "true"
          },
          "gender": {
            "type": "",
            "private": "true"
          }
        }
      }
    }
  ],
  "location": [
    {
      "endpoint": "rumpel/locations",
      "recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
      "data": {
        "longitude": "0.101014673709963",
        "latitude": "51.671358277138"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
      "data": {
        "longitude": "0.100905202634514",
        "latitude": "51.674001392439"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
      "data": {
        "longitude": "0.080477950927866",
        "latitude": "51.6658257133844"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
      "data": {
        "longitude": "0.0641066288762133",
        "latitude": "51.6641215101037"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "6a858d87-899e-4961-b722-0738d07c755e",
      "data": {
        "longitude": "0.0961801595986785",
        "latitude": "51.6712232446779"
      }
    }
  ]
}
```

To keep the example simple, it does not include complex data `combinators` covered in the previous step. However you will notice that the `endpoints` property has exactly the same format as the body of a request for creating a new `combinator`.

Like Data Combinators, Data Bundles can only be directly used by *privileged* applications such as the personal data dashboard. However this leads us to [Data Debits ](/build/dataswyft-one-apis/data-api/data-debits.md)for consented data sharing as Bundles is the format used to specify the data requested from the user.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dataswyft.com/build/dataswyft-one-apis/data-api/data-bundling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
