Developer
User guidesDeveloper websiteHelp centerLog in
  • Welcome!
  • Organisations structure
    • Datamart
    • Users and roles
  • User points
    • User identifiers
      • Networks IDs
        • Device-based Network IDs
          • Custom Device ID integration
          • ID5
          • First ID
        • User-based Network IDs
          • Custom User ID integration
          • UTIQ martechpass
      • Accounts
      • Emails
      • Device identifiers
    • User activities and events
    • Compartments
    • User profiles
    • User segments
    • Hyper point & Quarantine
  • Data model
    • Defining your schema
    • Computed fields
      • Concepts
      • Setup
      • Development
      • Examples
  • Data ingestion
    • Real time user tracking
      • Website tracking
      • Mobile apps tracking
      • Ads exposure tracking
      • AMP tracking
      • Conversions tracking
      • Email views and clicks
      • Tracking API
      • Event rules
      • Activity analyzers
    • Bulk processing
      • Imports
        • User activities import
        • User profiles import
        • User choices import
        • Segments import
      • Deletions
        • User identifiers deletion
        • Device points deletion
        • User points deletion
      • User identifiers association
      • Integration batch
    • Activities analytics
    • Data warehouse
      • Preliminary setup
        • BigQuery
      • Create data warehouse
  • Querying your data
    • OTQL queries
    • OTQL examples
    • GraphQL queries
    • UserPoint API
    • User activities
    • Activities analytics queries
      • API Quickstart
      • Dimensions and metrics
      • Use cases
    • Funnel API
  • Alerting
    • Alert configurations
  • Data visualisation
    • Quickstart
    • Dashboards
    • Sections and cards
    • Charts
    • Datasets and data sources
      • Using a data file data source
    • Transformations
    • Filters
    • Cookbook
    • Reference
  • Advanced usages
    • Audience segmentation
      • Audience features
      • Segment builders
      • Audience segment metrics
      • Audience segment feed
        • Building new feeds
        • Monitoring a feed
        • Curated Audiences (SDA)
      • Edge segments
      • Cohort-based Lookalike
    • Contextual targeting
      • Setup
      • Activation
        • Google Ad Manager
        • Xandr (through prebid.js)
      • API documentation
    • Exporting your data
      • Query Exports
      • Datamart replication
    • Data privacy compliance
      • User choices
      • Cleaning rules
      • Exercise of user rights
      • Cookies
    • Campaigns
    • Automations
      • Email routers
      • Email renderers
      • Opt-in provider
      • Custom action plugins
      • Usage limits for automations
    • Plugins
      • Concepts
      • Creation & Deployment
      • Coding your plugin
      • Manage existing plugins
      • Layouts
      • Presets
      • Monitoring
      • Throttling
      • Batching (for external feeds)
    • Platform monitoring
      • Resources usage
        • Dimensions and metrics
      • Collection volumes
        • Dimensions and metrics
      • Events ingestion monitoring
        • Dimensions and metrics
    • Data Clean Room
      • Bunker
      • Clean room
  • Resources
    • Tutorial: Data Ingestion
      • Your first events
        • Add the mediarithmics tag
          • Getting the tag
          • Adding the tag
        • Send events using the tag
          • Adding event properties
          • Finding the UserEvent type in your schema
          • Matching your schema
          • Standard events
      • Your first bulk imports
        • API basics
          • Authentication
          • Your first API call
        • Send documents using the API
          • Requirements
          • Sending documents
    • Using our API
      • Authentication
    • Tools & libraries
      • mics CLI
      • JS Tag
      • Plugin SDK
    • Data cubes
      • Creating a report
      • Reference
Powered by GitBook
On this page
  • How it works
  • Replication
  • Document type filtering
  • Versionning
  • Replication status
  • Initial synchronization
  • Output messages
  • Object formats based on doc_type
  • Message metadata
  • Legacy JSON Format
  • Avro Binary Format
  • Upgrade of datamarts to user_point_system_version v202205
  • Setting up replications
  • Prerequisites
  • Listing your replications
  • Creating & starting a replication
  • Activating / pausing a replication
  • Executing an initial synchronization
  • API documentation
  • Creating a replication
  • Add credentials to replication
  • Retrieve a replication

Was this helpful?

Export as PDF
  1. Advanced usages
  2. Exporting your data

Datamart replication

PreviousQuery ExportsNextData privacy compliance

Last updated 3 months ago

Was this helpful?

The datamart replication feature allows you to replicate the data ingested by mediarithmics in an external solution of your choice. For now, we are integrated with:

  • Google Cloud Platform - Pub/Sub

  • Microsoft Azure - Event Hubs (Alpha)

This module is not included in the default plan. Contact your Account manager to activate it.

How it works

Replication

We replicate the update and delete operations from your for the following objects:

  • (for datamart user_point_system_version before v202205)

  • (for datamart user_point_system_version v202205)

  • (for datamart user_point_system_version v202205)

  • Parent user points (when )

Document type filtering

You can select which documents to replicate when creating the Datamart replication using API (This capability will soon be added in the UI).

Note that you cannot update the document type filter selection once set. You will need to create a new datamart replication in that case.

Versionning

There are currently 2 versions of Datamart replications:

Version
Format

Version 1

(Legacy) JSON format

Version 2

Avro Binary format

Please note that:

  • At the moment, you can create a Version 2 datamart replication only through API (This capability will soon be added in the UI)

  • By default, a datamart replication is created in Version 1 (i.e. if "version" body parameter is not provided)

  • You cannot upgrade a datamart replication from Version 1 to Version 2

Replication status

Your replication can be in one of the following status:

  • ACTIVE: All data processed by your datamart will be replicated to your external solution.

  • PAUSED: No data processed by your datamart will be replicated to your external solution.

  • ERROR: The system is no longer able to replicate messages. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your Account manager.

Initial synchronization

You can run an initial synchronization so that already existing data within the mediarithmics platform can be replicated.

All your active replications will receive a set of UPDATE operations that represent all existing elements of your datamart for exampleUserPoint, and UserActivity.

Please note that if you run an initial synchronization you might receive a large volume of messages. Processing them can be expensive, depending on your cloud provider.

Output messages

We convert datamart operations in a standardized output format: operation = {ts, doc_type, doc_id, op, value}

Field
Type
Comment
Version availability

ts

UNIX Timestamp in ms (Long)

The mutation date

All

doc_type

Enumeration

The object type :UserActivity, UserProfile, UserSegment, UserAgent, UserAccount, UserEmail, UserPoint or UserPointParent

All

ctx_id

UUID

The userpoint id

Version 2

doc_id

String

All

op

String

The operation type UPDATE or DELETE

All

value

JSON Object

All

Object formats based on doc_type

doc_type

doc_id

value

UserPoint

{{user_point_id}}

Empty (you already have the user_point_id in the doc_id)

UserAgent

{{user_point_id}}:{{vector_id}}

Browser info and device info

UserDevicePoint

{{user_point_id}}:{{user_device_point_id}}

Browser info and device info

UserDeviceTechnicalId

{{user_point_id}}:{{user_device_point_id}}:{{user_device_technical_id}}

Empty (you already have the user_device_technical_id in the doc_id)

UserActivity

{{user_point_id}}:{{user_activity_id}}

Detailed activity

UserSegment

{{user_point_id}}:{{segment_id}}

Segment info

UserProfile

{{user_point_id}}:{{compartment_id}}:{{user_account_id}}

Detailed profile

UserAccount

{{user_point_id}}:{{compartment_id}}:{{user_account_id}}

Empty (you already have the user_account_id in the doc_id)

UserEmail

{{user_point_id}}:{{email_hash}}

User's email hash

UserPointParent

{{user_point_id}}

It is the ID of the UserPoint which is merged on the oldest one (the kept one).

Message <current_user_point_id> merged with <the_kept_user_point_id>

Message metadata

To help filtering the topic, replication adds some metadata on message (attributes in PubSub and properties in EventHub)

Metadata key
Comment

doc_type

The message doc_type

Legacy JSON Format

This format is the original replication format. It was designed to work with a streaming architecture (like Dataflow or Databricks) but has some limitations with tools needing a schema (like BigQuery)

Examples

A new activity will trigger a replicated UserActivity operation. You will receive a similar message in your external solution as shown in this example.

{
   "ts": 1676627112685,
   "doc_type": "UserActivity",
   "doc_id": "XXXXXXX-XXXX-XXX-XXXXXXXX:XXXXXX-XXXXX-XXXX-XXXX-XXXXXXXXXX",
   "op":" UPDATE",
   "value":{
       "$type":"SITE_VISIT",
       "$source":"XXXX",
       "etc": "etc"
   }
}

A new user agent will trigger a replicated UserAgent operation like the one bellow

{
   "ts":1676627112685,
   "doc_type":"UserAgent",
   "doc_id":"4700c85f-17e3-4304-aa7f-dc140173b08d:vec:32453299893",
   "op":"UPDATE",
   "value":{
      "$os_family":"LINUX",
      "$brand":null,
      "$os_version":null,
      "$form_factor":"PERSONAL_COMPUTER",
      "$carrier":null,
      "$model":null,
      "$creation_ts":0,
      "$browser_family":"FIREFOX"
   }
}

A new user device point will trigger a replication UserDevicePoint operation like the one bellow

{
   "ts":1676627112685,
   "doc_type":"UserDevicePoint",
   "doc_id":"4700c85f-17e3-4304-aa7f-dc140173b08d:udp:-32453299893",
   "op":"UPDATE",
   "value":{
      "$os_family":"LINUX",
      "$brand":null,
      "$os_version":null,
      "$form_factor":"PERSONAL_COMPUTER",
      "$carrier":null,
      "$model":null,
      "$creation_ts":0,
      "$browser_family":"FIREFOX"
   }
}

A new user device technical id will trigger a replicated UserDeviceTechnicalId operation like the ones bellow

// exemple with a MumId 
{
   "ts":1676627112685,
   "doc_type":"UserDeviceTechnicalId",
   "doc_id":"4700c85f-17e3-4304-aa7f-dc140173b08d:udp:-32453299893:mum:7231822539",
   "op":"UPDATE",
   "value":{}
}

// exemple with an installationId
{
   "ts":1676627112685,
   "doc_type":"UserDeviceTechnicalId",
   "doc_id":"4700c85f-17e3-4304-aa7f-dc140173b08d:udp:-32453299893:ins:1001:aZmFhOTVlM2ItMGRhOC00NDZlLWFhODMtNjZlZGI0YjNiNTk2",
   "op":"UPDATE",
   "value":{}
}

Avro Binary Format

This version introduce a schema to help integration.

{
  "type": "record",
  "name": "OperationRecord",
  "namespace": "com.mediarithmics.replication.format",
  "fields": [
    {
      "name": "ts",
      "type": {
        "type": "long",
        "logicalType": "timestamp-micros"
      }
    },
    {
      "name": "doc_type",
      "type": {
        "name": "DocumentType",
        "type": "enum",
        "symbols": [
          "UserPoint",
          "UserActivity",
          "UserProfile",
          "UserSegment",
          "UserDevicePoint",
          "UserDeviceTechnicalId",
          "UserAgent",
          "UserAccount",
          "UserEmail"
        ]
      }
    },
    {
      "name": "doc_id",
      "type": "string",
      "doc": "It will always start with the ctx_id (ie: user_point) followed by ':' and other internal ids. It identifies uniquely a document."
    },
    {
      "name": "ctx_id",
      "type": {
        "type": "string",
        "logicalType": "uuid"
      },
      "doc": "The user point id"
    },
    {
      "name": "op",
      "type": "string",
      "doc": "UPDATE or DELETE"
    },
    {
      "name": "value",
      "type": "string",
      "doc": "The object in JSON format"
    }
  ]
}

This format is almost the same as Legacy one, but with Avro binary format.

The target topic should reference the schema and the encoding as BINARY to take full advantages of the format.

Upgrade of datamarts to user_point_system_version v202205

In the case of a datamart that is upgraded to theuser_point_system_version v202205:

  • New device identifiers are directly stored and replicated using the device point formats,

  • Existing device identifiers that were previously stored in the UserAgent format are progressively migrated.

This migration is seemless within the datamart, however it is reflected on your datamart replication. For each migrated device identifier, you will receive:

  • A DELETE operation with the doc_type User Agent

  • Two UPDATE operations with doc_type UserDevicePoint and doc_type UserDeviceTechnicalId

For instance, a migration of a user agent with a doc_id 4700c85f-17e3-4304-aa7f-dc140173b08d:vec:7231822539 will produce

  • 1 DELETE operation with doc_type UserAgent the same doc_id

  • 2 UPDATE operations:

    • 1 with doc_type UserDevicePoint and the following doc_id: 4700c85f-17e3-4304-aa7f-dc140173b08d:udp:-32453299893

    • 1 with doc_type UserDeviceTechnicalId and the following doc_id 4700c85f-17e3-4304-aa7f-dc140173b08d:udp:-32453299893:mum:7231822539

After migration, no more UserAgent operations will be produced

Setting up replications

Prerequisites

You need to have an instance of the external solution where you want to replicate your mediarithmics data.

Depending on the external solution, you will need to fulfill some requirements.

Google Pub/Sub

You will need:

  • A Google Cloud Platform account

Click on Create Service Account :

Give your service account a name, select the right account access (Pub/Sub Publisher, Pub/Sub Editor) and save.

Once your Service Account is created, you can generate your key :

credentials.json file example:

{
  "type": "service_account",
  "project_id": "xxx-xxx-xx",
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\n xxxxxxx \n-----END PRIVATE KEY-----\n",
  "client_email": "xxx@project_id.iam.gserviceaccount.com",
  "client_id": "xxxxxx",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/projetc_id.iam.gserviceaccount.com"
}

Microsoft Azure Event Hubs (Alpha)

You will need:

  • A Microsoft Azure account

  • Save your connection string in a credentials.txt file. It would be your credentials file to upload in mediarithmics platform.

credentials.txt file example:

Endpoint=sb://<FQDN>/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<KeyValue>

Listing your replications

You can access replications in the datamart settings in your navigator application.

  1. Select the organisation on which there is the datamart you want to replicate.

  2. Click on Settings.

  3. Click on the Datamarts tab and then click the Datamart menu entry.

  4. Select the datamart you want to replicate.

  5. In the Replications subtab, you will see a table dedicated to your Datamart Replications.

Creating & starting a replication

To create a new replication:

  1. Click New Replication.

  2. Select a Replication type matching the external solution of your choice.

  3. Click Select a File to upload your credentials file and click on Update.

  4. Click Save Replication to create your new replication.

  5. You will see your new replication in the Replications subtab.

example for Google Pub/Sub :

When a Replication is created, its status is automatically set to Paused. To start your replication, you will have to activate it. If the system can't replicate your datamart on activation, you will see an error.

When a replication can't be activated, it is usually due to an error on credentials, so you might want to verify your replication configuration and your credentials file first.

Activating / pausing a replication

You can change the replication status using the status button.

If the system is no longer able to replicate the messages, the replication status will be set to ERROR. In this case, check your external solution (expired instance, invalid credentials, etc). If you can't find anything wrong, please contact your Account manager.

In case of an error with your external solution, you will need to recreate your replication to rebind it to a new working external solution with good credentials and the right specific information.

Executing an initial synchronization

A dashboard listing every Initial Synchronization that was done on your datamart is available in the same Replications subtab.

For now, you will have to ask your Account manager to run an initial synchronization. Later, you will be able to run an initial synchronization yourself by clicking New Execution. You must have at least one active replication and you can't run an initial synchronization more than once a week.

We replicate all operations. No filtering is possible.

Please note that you might receive a large volume of messages while running an initial synchronization. Processing them can be expensive, depending on your cloud provider.

While an initial synchronization is running, you can't change the status of your replications. The initial synchronization will only replicate the data for active replications.

The active replications are still running during initial synchronizations. Messages from the initial synchronization and live messages (tag, import, etc.) from the active replications are mixed.

API documentation

Creating a replication

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/replications

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

Request Body

Name
Type
Description

version

integer

name

string

The name of the API Token

status

string

Status of the replication. Need to be "PAUSED" at the creation

destination

string

GOOGLE_PUBSUB or AZURE_EVENT_HUBS

project_id

string

Google project ID. Only for GOOGLE_PUBSUB

topic_id

string

Google PubSub topic ID.Only for GOOGLE_PUBSUB

event_hub_name

string

Azure event hub name. Only for AZURE_EVENT_HUBS

datamart_id

string

As per the Path parameter

replication_filters

array

Optional.

List of documents to replicate. If not provided, all documents will be replicated

replication_filters > document

string

Name of the document to replicate among USER_SEGMENT, USER_EMAIL, USER_ACCOUNT, USER_PROFILE, USER_DEVICE_POINT, USER_AGENT, USER_DEVICE_TECHNICAL_ID, USER_ACTIVITY, USER_POINT

replication_filters > filter

string

Not used at the moment

{
	"status": "PAUSED",
	"name": ,
	"project_id": "test",				
	"topic_id": "test",
	"datamart_id": "1649",
	"destination": "GOOGLE_PUBSUB",		 
	"replication_filters": [{
		"document": "USER_SEGMENT",
		"filter": null,
	}, {
		"document": "USER_ACCOUNT",
		"filter": null,
	}, { … }]
}

Add credentials to replication

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/replications/:replicationId/credentials

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

replicationId

integer

The ID of the datamart replication

Request Body

Name
Type
Description

file

binary

The path of the file that contain the credentials

Retrieve a replication

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/replications/:replicationId

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

replicationId

integer

The ID of the datamart replication

{
	"id": "1234",
	"version" 2,
	"status": "PAUSED",
	"name": ,
	"project_id": "test",				
	"topic_id": "test",
	"datamart_id": "1649",
	"destination": "GOOGLE_PUBSUB",	
	"credentials_uri": "internal_path_to_credential_uri"	 
	"replication_filters": [{
		"id": "1",
		"replication_id": "1234",
		"document": "USER_SEGMENT",
		"filter": null,
	}, {
		"id": "2",
		"replication_id": "1234",
		"document": "USER_ACCOUNT",
		"filter": null,
	}, { … }]
}

The object unique id. depending on doc_type

The object value. depending on doc_type.

For datamarts with user_point_system_version anterior to v202205, device identifiers are stored as , and replicated as UserAgent operations (doc_id exemple: 4700c85f-17e3-4304-aa7f-dc140173b08d:vec:32453299893).

However for datamarts leveraging the user_point_system_version v202205, device identifiers are stored as , and replicated through UserDevicePoint and UserDeviceTechnicalId operations.

A Google Cloud Platform project: ;

An Access Control on this project: ;

To create and activate your service accounts (generate credentials file):

TO SUM IT UP: You can click here , create a service account, and edit it to create a Key in a JSON format (this is the credential file);

To create a Google Cloud Platform Pub/Sub instance. Pub/Sub documentation: until Quickstart setup > Create service account credentials (included) should be enough to begin.

NOTE: Here is the Google Pub/Sub Pricing documentation:

A Resource Group, an Event Hubs namespace, and an Event Hub:

A connection string over the namespace or the Event Hub:

NOTE: Here is the Microsoft Azure Event Hubs Pricing documentation:

Go to the subtab.

Complete configuration information (see as help to configure advanced fields).

Version of the datamart replication to be created. Check for more info.

User Device Points and User Device Technical Ids
https://cloud.google.com/resource-manager/docs/creating-managing-projects
https://cloud.google.com/resource-manager/docs/access-control-proj
https://cloud.google.com/iam/docs/understanding-service-accounts
https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances
https://console.cloud.google.com/iam-admin/serviceaccounts
https://cloud.google.com/pubsub/docs/quickstart-py-mac
https://cloud.google.com/pubsub/pricing
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-create
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string
https://azure.microsoft.com/en-us/pricing/details/event-hubs/
Replications
Prerequisites
Versionning
The format varies
The format varies
Datamart
User points
User activities
User segments
User profiles
User accounts
User emails
User Agents
User agents
User device points
User device technical ids
user points are merging