Delta Sharing
Delta Sharing is an open protocol for the secure exchange of large amounts of data, using a simple REST protocol that securely shares access to part of a cloud dataset. It is a way, when transferring large amounts of data, to download only the changes and not the whole dataset every time, saving time and bandwidth. Voyado Engage uses delta sharing extensively, for example for the transfer of the client data needed when compiling BI reports.
Caution
Delta Share has a rate limit of 10 requests per second. If this is not followed, a code 429 response will be received. So if a client runs all tables at once there is a risk that some tables may fail due to the rate limit being exceeded.
Basic concepts
Share: A share is a logical grouping of data intended to be shared with recipients. A share can be shared with one or multiple recipients. A recipient can access all resources in a share.
Table: A table is a dataset consisting of one or several files.
Schema: A schema is a logical grouping of tables. One schema may contain multiple tables.
Recipient: A recipient is a principal (such as a person) with a bearer token to access shared tables.
Shared data characteristics
The shared data is stored in Parquet format, which is a column storage format designed for efficient data storage and retrieval. In addition to the Parquet files, a metadata layer is added to fulfill the Delta format requirements. Among other things, this metadata layer makes it possible to fetch all the data in a table that was added after a certain timestamp.
Each time a share is updated, new files are added to it, using append mode. This means there can be several versions of a record in the shared dataset. For example, if a contact updates their address, both the old and the updated record will be available in the shared dataset. To determine if a record is of a later version, the column “import_date_time” in the data is used. This contains timestamps showing when a record was added.
Delta share REST APIs
The Delta Sharing Protocol has a number of REST APIs that can be used to programmatically integrate with the shared data. An overview of the most useful ones is given in this article. For more details go to the Delta Sharing Protocol Github.
All APIs require the bearer token <token> and endpoint URL <prefix> that you can get from the downloaded Delta Share config file.
Delta shared connectors
Their are several community developed connectors in several programming languages that implement the Delta Share REST APIs. These can be used directly to integrate with Delta Share, or as examples when implementing your own connector using the REST protocol.
List Shares
This API endpoint gives a response containing a list of all shares that your recipient has access to. In the Engage implementation, a recipient has only access to one share, meaning the response will always contain just one share.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares |
List Schemas in a Share
This API endpoint gives a response containing a list of all schemas in the share that your recipient has access to. In the Engage implementation a share has only one schema, meaning the response will always contain just one schema.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares/<share>/schemas |
List Tables in a Schema
This API endpoint gives a response containing a list of all tables in the shared schema that your recipient has access to.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares/<share>/schemas/<schema>/tables |
List all Tables in a Share
This API endpoint gives a response containing a list of all tables in the share that your recipient has access to. Because, in the Engage implementation, a share only contains one schema, this request responds with the same list of tables as the “List Tables in a Schema” API.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares/<share>/all-tables |
Query Table Metadata
This API endpoint gives a response with a some metadata for a shared table. The most useful part is probably the table schema.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares/<share>/schemas/<schema>/tables/<table>/metadata |
Read Change Data Feed from a Table
This API endpoint gives a response containing paths to the files in a shared table that your recipient has access to. This API has a few different parameters that can be used to filter the response about when a file was added to the table. Each time a table is updated, it is said to have a new version, all of which have a version number and a timestamp in the metadata. That metadata can be used to filter out one or several versions of the table. This API lets you either filter the response by version number or timestamp. For example, you can request all files that were changed from version X to Y or all version after version X. X and Y can be either a version number or a timestamp.
If you are using timestamp to filter the response, it should be in ISO-8601 format in UTC timezone, (2023-11-01T00:00:00Z). The timestamp needs to be equal to or later than the first time that data was added to the table. If you provide a timestamp prior to that you will get prompted with a message saying what the first available timestamp is.
Method | GET |
---|---|
Header | Authorization: Bearer <token> |
Endpoint URL | <prefix>/shares/<share>/schemas/<schema>/tables/<table>/changes |
Parameters | startingVersion (optional): Long endingVersion (optional): Long startingTimestamp (optional): String endingTimestamp (optional): String |