Skip to main content

Delta Sharing

Delta Sharing is an open protocol for the secure exchange of large amounts of data, using a simple REST protocol that securely shares access to part of a cloud dataset. It is a way, when transferring large amounts of data, to download only the changes and not the whole dataset every time, saving time and bandwidth. Voyado Engage uses delta sharing extensively, for example for the transfer of the client data needed when compiling BI reports.

Basic concepts

  • Share: A share is a logical grouping of data intended to be shared with recipients. A share can be shared with one or multiple recipients. A recipient can access all resources in a share.

  • Table: A table is a dataset consisting of one or several files.

  • Schema: A schema is a logical grouping of tables. One schema may contain multiple tables.

  • Recipient: A recipient is a principal (such as a person) with a bearer token to access shared tables.

Shared data characteristics

The shared data is stored in Parquet format, which is a column storage format designed for efficient data storage and retrieval. In addition to the Parquet files, a metadata layer is added to fulfill the Delta format requirements. Among other things, this metadata layer makes it possible to fetch all the data in a table that was added after a certain timestamp.

Each time a share is updated, new files are added to it, using append mode. This means there can be several versions of a record in the shared dataset. For example, if a contact updates their address, both the old and the updated record will be available in the shared dataset. To determine if a record is of a later version, the column “import_date_time” in the data is used. This contains timestamps showing when a record was added.

Delta share REST APIs

The Delta Sharing Protocol has a number of REST APIs that can be used to programmatically integrate with the shared data. An overview of the most useful ones is given in this article. For more details go to the Delta Sharing Protocol Github.

All APIs require the bearer token <token> and endpoint URL <prefix> that you can get from the downloaded Delta Share config file.

Delta shared connectors

Their are several community developed connectors in several programming languages that implement the Delta Share REST APIs. These can be used directly to integrate with Delta Share, or as examples when implementing your own connector using the REST protocol.

List Shares

This API endpoint gives a response containing a list of all shares that your recipient has access to. In the Engage implementation, a recipient has only access to one share, meaning the response will always contain just one share.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares

List Schemas in a Share

This API endpoint gives a response containing a list of all schemas in the share that your recipient has access to. In the Engage implementation a share has only one schema, meaning the response will always contain just one schema.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares/<share>/schemas

List Tables in a Schema

This API endpoint gives a response containing a list of all tables in the shared schema that your recipient has access to.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares/<share>/schemas/<schema>/tables

List all Tables in a Share

This API endpoint gives a response containing a list of all tables in the share that your recipient has access to. Because, in the Engage implementation, a share only contains one schema, this request responds with the same list of tables as the “List Tables in a Schema” API.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares/<share>/all-tables

Query Table Metadata

This API endpoint gives a response with a some metadata for a shared table. The most useful part is probably the table schema.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares/<share>/schemas/<schema>/tables/<table>/metadata

Read Change Data Feed from a Table

This API endpoint gives a response containing paths to the files in a shared table that your recipient has access to. This API has a few different parameters that can be used to filter the response about when a file was added to the table. Each time a table is updated, it is said to have a new version, all of which have a version number and a timestamp in the metadata. That metadata can be used to filter out one or several versions of the table. This API lets you either filter the response by version number or timestamp. For example, you can request all files that were changed from version X to Y or all version after version X. X and Y can be either a version number or a timestamp.

If you are using timestamp to filter the response, it should be in ISO-8601 format in UTC timezone, (2023-11-01T00:00:00Z). The timestamp needs to be equal to or later than the first time that data was added to the table. If you provide a timestamp prior to that you will get prompted with a message saying what the first available timestamp is.

Method

GET

Header

Authorization: Bearer <token>

Endpoint URL

<prefix>/shares/<share>/schemas/<schema>/tables/<table>/metadata

Parameters

startingVersion (optional): Long

endingVersion (optional): Long

startingTimestamp (optional): String

endingTimestamp (optional): String