Datamart replication
Last updated
Was this helpful?
Last updated
Was this helpful?
The datamart replication feature allows you to replicate the data ingested by mediarithmics in an external solution of your choice. For now, we are integrated with:
Google Cloud Platform - Pub/Sub
Microsoft Azure - Event Hubs (Alpha)
We replicate the update
and delete
operations from your for the following objects:
(for datamart user_point_system_version
before v202205)
(for datamart user_point_system_version
v202205)
(for datamart user_point_system_version
v202205)
Parent user points (when )
You can select which documents to replicate when creating the Datamart replication using API (This capability will soon be added in the UI).
Note that you cannot update the document type filter selection once set. You will need to create a new datamart replication in that case.
There are currently 2 versions of Datamart replications:
Version 1
(Legacy) JSON format
Version 2
Avro Binary format
Your replication can be in one of the following status:
ACTIVE: All data processed by your datamart will be replicated to your external solution.
PAUSED: No data processed by your datamart will be replicated to your external solution.
ERROR: The system is no longer able to replicate messages. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your Account manager.
You can run an initial synchronization so that already existing data within the mediarithmics platform can be replicated.
All your active replications will receive a set of UPDATE
operations that represent all existing elements of your datamart for exampleUserPoint
, and UserActivity.
Please note that if you run an initial synchronization you might receive a large volume of messages. Processing them can be expensive, depending on your cloud provider.
We convert datamart operations in a standardized output format: operation = {ts, doc_type, doc_id, op, value}
ts
UNIX Timestamp in ms (Long)
The mutation date
All
doc_type
Enumeration
The object type :UserActivity
, UserProfile
, UserSegment
, UserAgent
, UserAccount
, UserEmail
, UserPoint
or UserPointParent
All
ctx_id
UUID
The userpoint id
Version 2
doc_id
String
All
op
String
The operation type UPDATE
or DELETE
All
value
JSON Object
All
doc_type
doc_type
doc_id
value
UserPoint
{{user_point_id}}
Empty (you already have the user_point_id
in the doc_id
)
UserAgent
{{user_point_id}}:{{vector_id}}
Browser info and device info
UserDevicePoint
{{user_point_id}}:{{user_device_point_id}}
Browser info and device info
UserDeviceTechnicalId
{{user_point_id}}:{{user_device_point_id}}:{{user_device_technical_id}}
Empty (you already have the user_device_technical_id
in the doc_id
)
UserActivity
{{user_point_id}}:{{user_activity_id}}
Detailed activity
UserSegment
{{user_point_id}}:{{segment_id}}
Segment info
UserProfile
{{user_point_id}}:{{compartment_id}}:{{user_account_id}}
Detailed profile
UserAccount
{{user_point_id}}:{{compartment_id}}:{{user_account_id}}
Empty (you already have the user_account_id
in the doc_id
)
UserEmail
{{user_point_id}}:{{email_hash}}
User's email hash
UserPointParent
{{user_point_id}}
It is the ID of the UserPoint
which is merged on the oldest one (the kept one).
Message <current_user_point_id> merged with <the_kept_user_point_id>
To help filtering the topic, replication adds some metadata on message (attributes in PubSub and properties in EventHub)
doc_type
The message doc_type
This format is the original replication format. It was designed to work with a streaming architecture (like Dataflow or Databricks) but has some limitations with tools needing a schema (like BigQuery)
A new activity will trigger a replicated UserActivity
operation. You will receive a similar message in your external solution as shown in this example.
A new user agent will trigger a replicated UserAgent operation like the one bellow
A new user device point will trigger a replication UserDevicePoint operation like the one bellow
A new user device technical id will trigger a replicated UserDeviceTechnicalId
operation like the ones bellow
This version introduce a schema to help integration.
This format is almost the same as Legacy one, but with Avro binary format.
The target topic should reference the schema and the encoding as BINARY to take full advantages of the format.
user_point_system_version
v202205In the case of a datamart that is upgraded to theuser_point_system_version
v202205:
New device identifiers are directly stored and replicated using the device point formats,
Existing device identifiers that were previously stored in the UserAgent format are progressively migrated.
This migration is seemless within the datamart, however it is reflected on your datamart replication. For each migrated device identifier, you will receive:
A DELETE operation with the doc_type User Agent
Two UPDATE operations with doc_type UserDevicePoint and doc_type UserDeviceTechnicalId
After migration, no more UserAgent operations will be produced
You need to have an instance of the external solution where you want to replicate your mediarithmics data.
Depending on the external solution, you will need to fulfill some requirements.
You will need:
A Google Cloud Platform account
Click on Create Service Account :
Give your service account a name, select the right account access (Pub/Sub Publisher, Pub/Sub Editor) and save.
Once your Service Account is created, you can generate your key :
credentials.json file example:
You will need:
A Microsoft Azure account
Save your connection string in a credentials.txt file. It would be your credentials file to upload in mediarithmics platform.
credentials.txt file example:
You can access replications in the datamart settings in your navigator application.
Select the organisation on which there is the datamart you want to replicate.
Click on Settings.
Click on the Datamarts tab and then click the Datamart menu entry.
Select the datamart you want to replicate.
In the Replications subtab, you will see a table dedicated to your Datamart Replications.
To create a new replication:
Click New Replication.
Select a Replication type matching the external solution of your choice.
Click Select a File to upload your credentials file and click on Update.
Click Save Replication to create your new replication.
You will see your new replication in the Replications subtab.
example for Google Pub/Sub :
When a Replication is created, its status is automatically set to Paused. To start your replication, you will have to activate it. If the system can't replicate your datamart on activation, you will see an error.
You can change the replication status using the status button.
If the system is no longer able to replicate the messages, the replication status will be set to ERROR. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your Account manager.
A dashboard listing every Initial Synchronization that was done on your datamart is available in the same Replications subtab.
For now, you will have to ask your Account manager to run an initial synchronization. Later, you will be able to run an initial synchronization yourself by clicking New Execution. You must have at least one active replication and you can't run an initial synchronization more than once a week.
We replicate all operations. No filtering is possible.
Please note that you might receive a large volume of messages while running an initial synchronization. Processing them can be expensive, depending on your cloud provider.
While an initial synchronization is running, you can't change the status of your replications. The initial synchronization will only replicate the data for active replications.
The active replications are still running during initial synchronizations. Messages from the initial synchronization and live messages (tag, import, etc.) from the active replications are mixed.
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/replications
datamartId
integer
The ID of the datamart
version
integer
name
string
The name of the API Token
status
string
Status of the replication. Need to be "PAUSED" at the creation
destination
string
GOOGLE_PUBSUB
or AZURE_EVENT_HUBS
project_id
string
Google project ID. Only for GOOGLE_PUBSUB
topic_id
string
Google PubSub topic ID.Only for GOOGLE_PUBSUB
event_hub_name
string
Azure event hub name. Only for AZURE_EVENT_HUBS
datamart_id
string
As per the Path parameter
replication_filters
array
Optional.
List of documents to replicate. If not provided, all documents will be replicated
replication_filters > document
string
Name of the document to replicate among USER_SEGMENT
, USER_EMAIL
, USER_ACCOUNT
, USER_PROFILE
, USER_DEVICE_POINT
, USER_AGENT
, USER_DEVICE_TECHNICAL_ID
, USER_ACTIVITY
, USER_POINT
replication_filters > filter
string
Not used at the moment
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/replications/:replicationId/credentials
datamartId
integer
The ID of the datamart
replicationId
integer
The ID of the datamart replication
file
binary
The path of the file that contain the credentials
GET
https://api.mediarithmics.com/v1/datamarts/:datamartId/replications/:replicationId
datamartId
integer
The ID of the datamart
replicationId
integer
The ID of the datamart replication
The object unique id. depending on doc_type
The object value. depending on doc_type.
For datamarts with user_point_system_version
anterior to v202205, device identifiers are stored as , and replicated as UserAgent operations (doc_id exemple: 4700c85f-17e3-4304-aa7f-dc140173b08d:vec:32453299893
).
However for datamarts leveraging the user_point_system_version
v202205, device identifiers are stored as , and replicated through UserDevicePoint and UserDeviceTechnicalId operations.
A Google Cloud Platform project: ;
An Access Control on this project: ;
To create and activate your service accounts (generate credentials file):
TO SUM IT UP: You can click here , create a service account, and edit it to create a Key in a JSON format (this is the credential file);
To create a Google Cloud Platform Pub/Sub instance. Pub/Sub documentation: until Quickstart setup > Create service account credentials (included) should be enough to begin.
NOTE: Here is the Google Pub/Sub Pricing documentation:
A Resource Group, an Event Hubs namespace, and an Event Hub:
A connection string over the namespace or the Event Hub:
NOTE: Here is the Microsoft Azure Event Hubs Pricing documentation:
Go to the subtab.
Complete configuration information (see as help to configure advanced fields).
Version of the datamart replication to be created. Check for more info.