Computations API

Private Serverless Computation is the DDS’s capability to run any algorithms on data within the DDS without leaking it anywhere else. It extends the core DDS functionality to include algorithms ranging from simple summary of DDS data to personal AI. Think of them as private serverless functions.

DDS functions are sometimes called "Smart Hat Engine" (SHE) functions or "Tools" or "PDA Functions"

Key goals for DDS functions are:

  • Supporting algorithms written in a wide range of languages, providing flexibility in choosing the best tool for the job and limiting the need to reimplement any existing algorithms.

  • Not forcing open-sourcing of the algorithms. As many organisations consider their algorithms to be the "secret sauce", they should be able to maintain their competitive advantage when operating within the PDA ecosystem.

  • Generating more data for the person – while an organisation owns their algorithm, the person still owns all of their data.

  • Providing a trusted environment that is sufficiently isolated from the core DDS to eliminate the risks of unauthorised data access and to respect the legal data rights of the DDS owner

  • Preventing any personal data leakage while running potentially untrusted, closed-source algorithms.

  • Elasticity in scaling – supporting large numbers of users as well as minimising resource cost when inactive, without burdening the algorithm developer.

Containerised applications appears to be the obvious choice due to the possibility of writing them in any language and having isolation guarantees. The rest can be controlled through a well-defined interface between the DDS itself and the AWS Lambda runtime. Algorithms run in such an isolated environment with no ability to communicate with the outside world, enforced through firewalls and security policies. An algorithm only runs reactively in response to a request from a DDS, processes the received data and returns results in a response. The downside of the approach is that it does not allow for accumulating data over longer periods of time (the DDS does this itself), it does not allow for aggregation of data across multiple users, and the algorithms that can be executed are limited to ones that are fixed ahead of deployment, whether traditional code or pre-trained Machine Learning models. Serverless environments (such as AWS Lambda) allow for the remaining goals of elasticity, on-demand use and ease of deployment.

Current limitation in the AWS Lambda environment is that it provides little detail and no guarantees on how a specific container instance gets reused; there are possibilities for timing-related attacks. Specifically, a common optimisation is to have some state retained in a given container (more in the sense of caching than storage as there are no guarantees that the same container will get used), however that state can also contain data previously received from a DDS. Although interactions with a given function are driven by DDSs and not the functions themselves, and functions are unable to communicate with the outside world, they could respond with custom responses to a specific DDS controlled by the perpetrator. This, too, is mitigated through metadata logging, but additional controls around function scheduling and execution could eliminate the risk.

Building a function

DDS functions are currently standard AWS Lambda functions and benefit from a wealth of information on how to build such functions.

While an over-simplification, it is not inaccurate to say that you can just drop in an algorithm you have already written or write one in any of the following major language and framework:

  • Node.js

  • Java (Java 8 and other languages that are supported by the runtime — we ❤️Scala!)

  • Python

  • .NET Core

  • Go

Furthermore, the DDS uses the industry-standard JSON protocol for handling data, therefore what your algorithm receives is simply a bundle of JSON records (sometimes called documents) matching your specific Data Bundle query (check the docs on Data Bundles for more details).

Your function needs to do 3 things:

  1. Publish its configuration to simplify editing the details.

  2. Return Data Bundle specifying what data it wants to receive, parametrised by the date range (fromDate and untilDate query parameters in ISO8601 format).

  3. Accept data processing request which includes the current known configuration from the DDS and the bundle of data itself generated using the bundle received from (2)

A common recommendation is to split your algorithm details from the Lambda function handling details – it makes testing and debugging a lot simpler. You should try and develop your entire algorithm outside the DDS (the serverless framework includes a helpful set of tools for that), exposing the 3 steps above as separate API Gateway endpoints. You should be able to feed the generated Data Bundle definition into the DDS you use for development, as well as the data extracted from the DDS using the bundle into your algorithm for processing.

Everything else is the details of your own implementation!

Limitations

AWS Lambda functions and by extension – DDS functions also have some limitations worth noting:

  • You can allocate between 128MB and 3008MB of memory to the function

  • It has 512MB of "ephemeral" disk storage – some for storing temporary files. Do not rely on it persisting between runs.

  • Running time is limited to 5 minutes max – you will need to manage efficiency and amount of data processed in a run.

  • Maximum request size is 6MB. You will not be able to process a huge amount of an individual PDA's data in one go, but 6MB can fit a lot of JSON.

  • Deployment package is no bigger than 250MB (though you can load e.g. your models externally, taking care to make sure the algorithm execution does not time out when including loading of the model)

  • DDS-specific – you cannot communicate with any remote networked resources, even if you have a great use case, to limit possibilities of leaking user's data.

  • DDS-specific – function execution is driven by the DDS itself, you cannot subscribe to other sources of Lambda events, to preserve the control and autonomy of the DDS owner.

When DDS functions are executed

Each function available in a particular DDS cluster is registered in the DDS’s static configuration, which provides the ID of the function along with the version to be used, namespace and endpoint the function is allowed to publish data to and the details necessary for the DDS to know how to invoke it.

DDS internally tracks data "events" and with incoming data events, it determines what functions may need to be invoked on the data. The current approach is rather straightforward: the DDS accumulates a bunch of events and checks what endpoints they were for. It then compares the set of endpoints against the functions enabled for the DDS and if there is an overlap – checks trigger details for the function. A new function execution with all data since the last execution matching the bundle is started when the trigger is either individual (should be run for every individual data record) or period and at least the specified period of time has passed since last invocation.

It is important to note that unless triggered manually via an API endpoint, functions for a DDS will not run if there is no new data coming in, generating data events which in turn trigger functions. In a completely inactive DDS, such functions would never be executed.

How DDS functions are executed

Every time the DDS decides it needs to execute a function, it performs 3 steps:

  1. Asks the function to provide it with a Data Bundle definition for the timeframe between most recent execution and now

  2. Sends the current known function configuration (including last execution time) together with the data retrieved for the bundle configuration to the function

  3. Saves the returned data and the time of execution

This results in the generated data becoming available for the DDS owner and other applications the same way as any other data, with no need to deal with the complexities of running algorithms, managing dependencies between components or running dedicated infrastructure.

Function information reference

Each function publishes its configuration through a Lambda function handler; this section provides the details on what information is included. Note that publishing of the function is managed by the PDS Service Provider and the information will always be reviewed.

FunctionConfiguration

FunctionInfo

FunctionTrigger

FunctionStatus

Testing your function

Use the Function Testing postman collection we have prepared for this purpose!

You will need to configure the function environment with your own function's API gateway details, choose the PDA you want to use for testing and update PDA credentials for the whole collection to run successfully. Once you are satisfied that it works correctly, please contact us to have it reviewed and integrated.

Function management

All function management is performed through the endpoints to list, setup and disable the apps as well as in certain cases – get the application token for the frontend to use in authenticating with a remote service.

Listing functions

Applications are listed at /api/v2.6/she/function – returns the full list of available functions

This method is the only one needed to call to get a comprehensive list of functions along with their status on the DDS (available, enabled, execution time).

An individual function information is accessible at /api/v2.6/she/function/:function-id but this shouldn’t be needed in most cases. It will have exactly the same information and format as a single item in the list returned by /api/v2.6/she/function.

Setting up

Function is set up by calling GET /api/v2.6/she/function/:function-id/enable.

The steps of setting up a function with a DDS happen transparently after calling the enable endpoint.

Similarly, a function gets disabled by calling /api/v2.6/she/function/:function-id/disable. This takes care of recording the fact on the DDS, disabling any DDS access and suspending future function invocations.

Executing

Most functions are expected to be executed automatically by the DDS, however it is still possible (with "owner" DDS permissions) to execute a function manually by calling GET /api/v2.6/she/function/:function-id/trigger

Function availability

It is important to note that only DDS functions that have been registered with a DDS cluster will be available to use.

It is achieved in the DDS’s configuration (application.conf) and therefore new functions currently require the DDS to be redeployed with them included in the configuration:

she {
  functions = [
    {
      id = "data-feed-counter"
      version = "1.0.0"
      baseUrl = "https://ociflwukh1.execute-api.eu-west-1.amazonaws.com/dev"
      namespace = "she"
      endpoint = "insights/activity-records"
    }
    {
      id = "sentiment-tracker"
      version = "1.0.0"
      baseUrl = "https://ociflwukh1.execute-api.eu-west-1.amazonaws.com/dev"
      namespace = "she"
      endpoint = "insights/emotions"
    }
  ]
}

The format of the configuration should be self-explanatory: the configuration provides a list of functions, each identified by an ID, the version to be used, baseUrl as the address of the API gateway, and finally, namespace and endpoint it is allowed to create the data in. The rest is done automatically by the DDS, including loading the full configuration, issuing the calls, saving the data, etc.

Last updated