Datamart replication
The datamart replication feature allows you to replicate all the data ingested by mediarithmics in an external solution of your choice. For now, we integrate with:
  • Google Cloud Platform - Pub/Sub
  • Microsoft Azure - Event Hubs (Alpha)
This module is not included in the default plan. Contact your CSM to activate it.

How it works

We replicate the update and delete operations from Pionus for the eight following objects:
We replicate all of these operations. No filtering is available.

Replication status

Your replication can be in one of the following status:
  • ACTIVE: All data processed by Pionus will be replicated to your external solution.
  • PAUSED: No data processed by Pionus will be replicated to your external solution.
  • ERROR: The system is no longer able to replicate messages. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your mediarithmics support.

Initial synchronization

You can run an initial synchronization so that already existing data within the mediarithmics platform can be replicated.
All your active replications will receive a set of UPDATE operations that represent all existing elements of your datamart for exampleUserPoint, and UserActivity.
Please note that if you run an initial synchronization you might receive a large volume of messages. Processing them can be expensive, depending on your cloud provider.

Output messages

We convert Pionus operations in a standardized output format: operation = {ts, doc_type, doc_id, op, value}
Field
Type
Comment
ts
Timestamp (Long)
The mutation date
doc_type
String
The object type :UserActivity, UserProfile, UserSegment, UserAgent, UserAccount, UserEmail, UserPoint or UserPointParent
doc_id
String
The object unique id. The format varies depending on doc_type
op
String
The operation type UPDATE or DELETE
value
JSON Object
The object value. The format varies depending on doc_type.

Object formats based on doc_type

doc_type
doc_id
value
UserPoint
{{user_point_id}}
Empty (you already have the user_point_id in the doc_id)
UserAgent
{{user_point_id}}:{{vector_id}}
Browser info and device info
UserActivity
{{user_point_id}}:{{user_activity_id}}
Detailed activity
UserSegment
{{user_point_id}}:{{segment_id}}
Creation timestamp and last modify timestamp (date)
UserProfile
{{user_point_id}}:{{compartment_id}}:{{user_account_id}}
Detailed profile
UserAccount
{{user_point_id}}:{{compartment_id}}:{{user_account_id}}
Empty (you already have the user_account_id in the doc_id)
UserEmail
{{user_point_id}}:{{email_hash}}
User's email hash
UserPointParent
{{user_point_id}}
It is the ID of the UserPoint which is merged on the oldest one (the kept one).
Message <current_user_point_id> merged with <the_kept_user_point_id>

Example

A new activity will trigger a replicated UserActivity operation. You will receive a similar message in your external solution as shown in this example.
1
{
2
"ts": 1572947762,
3
"doc_type": "UserActivity",
4
"doc_id": "XXXXXXX-XXXX-XXX-XXXXXXXX:XXXXXX-XXXXX-XXXX-XXXX-XXXXXXXXXX",
5
"op":" UPDATE",
6
"value":{
7
"$type":"SITE_VISIT",
8
"$source":"XXXX",
9
"etc": "etc"
10
}
11
}
Copied!

Setting up replications

Prerequisites

You need to have an instance of the external solution where you want to replicate your mediarithmics data.
Depending on the external solution, you will need to fulfill some requirements.

Google Pub/Sub

You will need:
credentials.json file example:
1
{
2
"type": "service_account",
3
"project_id": "xxx-xxx-xx",
4
"private_key_id": "xxxxxxx",
5
"private_key": "-----BEGIN PRIVATE KEY-----\n xxxxxxx \n-----END PRIVATE KEY-----\n",
6
"client_email": "[email protected]_id.iam.gserviceaccount.com",
7
"client_id": "xxxxxx",
8
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
9
"token_uri": "https://oauth2.googleapis.com/token",
10
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
11
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/projetc_id.iam.gserviceaccount.com"
12
}
Copied!
NOTE: Here is the Google Pub/Sub Pricing documentation: https://cloud.google.com/pubsub/pricing

Microsoft Azure Event Hubs (Alpha)

You will need:
credentials.txt file example:
1
Endpoint=sb://<FQDN>/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<KeyValue>
Copied!
NOTE: Here is the Microsoft Azure Event Hubs Pricing documentation: https://azure.microsoft.com/en-us/pricing/details/event-hubs/

Listing your replications

You can access replications in the datamart settings in your navigator application.
  1. 1.
    Select the organisation on which there is the datamart you want to replicate.
  2. 2.
    Click on Settings.
  3. 3.
    Click on the Datamarts tab and then click the Datamart menu entry.
  4. 4.
    Select the datamart you want to replicate.
  5. 5.
    In the Replications subtab, you will see a table dedicated to your Datamart Replications.

Creating & starting a replication

To create a new replication:
  1. 1.
    Go to the Replications subtab.
  2. 2.
    Click New Replication.
  3. 3.
    Select a Replication type matching the external solution of your choice.
  4. 4.
    Complete configuration information (see Prerequisites as help to configure advanced fields).
  5. 5.
    Click Select a File to upload your credentials file and click on Update.
  6. 6.
    Click Save Replication to create your new replication.
  7. 7.
    You will see your new replication in the Replications subtab.
When a Replication is created, its status is automatically set to Paused. To start your replication, you will have to activate it. If the system can't replicate your datamart on activation, you will see an error.
When a replication can't be activated, it is usually due to an error on credentials, so you might want to verify your replication configuration and your credentials file first.

Activating / pausing a replication

You can change the replication status using the status button.
If the system is no longer able to replicate the messages, the replication status will be set to ERROR. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your mediarithmics support.
In case of an error with your external solution, you will need to recreate your replication to rebind it to a new working external solution with good credentials and the right specific information.

Executing an initial synchronization

A dashboard listing every Initial Synchronization that was done on your datamart is available in the same Replications subtab.
For now, you will have to ask your support to run an initial synchronization. Later, you will be able to run an initial synchronization yourself by clicking New Execution. You must have at least one active replication and you can't run an initial synchronization more than once a week.
We replicate all operations. No filtering is possible.
Please note that you might receive a large volume of messages while running an initial synchronization. Processing them can be expensive, depending on your cloud provider.
While an initial synchronization is running, you can't change the status of your replications. The initial synchronization will only replicate the data for active replications.
The active replications are still running during initial synchronizations. Messages from the initial synchronization and live messages (tag, import, etc.) from the active replications are mixed.