Bulk processing

Bulk import aims at giving you the ability to bulk-import data into the mediarithmics platform.

You can import:

  • Offline activities such as offline purchases and store visits

  • User segments such as email lists, cookies list, user accounts list, etc.

  • User profiles such as CRM data and scoring

  • User association such as CRM Onboarding

  • User dissociation

  • User suppression requests such as GDPR Suppression requests, and Opt-Out Management

How it works

You upload files associated with a document import definition:

  • Files represent the data.

  • Document imports represent what mediarithmics should do with the data.

If you need to track users in real-time, you should read the real-time tracking guide.

The two steps for bulk import are:

  1. Create the document import definition to tell mediarithmics what you are importing

  2. Upload files associated with the document import definition. Each uploaded file creates a new document import execution.

For maximum performance:

  • Ensure a maximum size for each file of 100M.

  • Use the document import for multiple records when there will be more than 1,000 per file.

How to choose between creating a new document import or adding a new file to an existing document import? Our recommendation is to create a new document import each time you have a new set of files to upload. For example, if you upload CRM profiles every night, you should create a new "User profiles from CRM - " document import every night instead of just uploading new files to a unique "User profiles from CRM" document import.

Each line in the uploaded file is a command to execute. Depending on the document import type, you have different commands available.

User identifiers in imports

When importing data, you need to properly add user identifiers. This will ensure your data is associated with the proper user point.

Only one identifier is allowed per line. For example, you shouldn't specify the user agent ID if the Email Hash is already used in a line.

However, you don't have to always use the same type of identifier in your document. For example, one line could use the user account ID while another uses the email hash.

Document import

Document imports define what you are about to upload in one or multiple files.

A document import object has the following properties:

field

type

description

document_type

Enum

The type of data you want to import. Should be USER_ACTIVITY, USER_SEGMENT, USER_PROFILE,

USER_CHOICE,

USER_IDENTIFIERS_DELETION , or USER_IDENTIFIERS_ASSOCIATION_DECLARATIONS

mime_type

Enum

The format of the imported data. APPLICATION_X_NDJSONor TEXT_CSVIt should match the file format of the upload file, e.g. .csv or .ndjson. The csv format can be chosen only for USER_SEGMENT imports.

encoding

String

Encoding of the data that will be imported. Usuallyutf-8

name

String

The name of your import.

priority

Enum

LOW, MEDIUM or HIGH

// Sample document import object
{
    "document_type": "USER_ACTIVITY",
    "mime_type": "APPLICATION_X_NDJSON",
    "encoding": "utf-8",
    "name": "<YOUR_DOCUMENT_IMPORT_NAME>"
}

Create a document import

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart in which your data will be imported

Request Body

Name
Type
Description

data

object

The document import object you wish to create

{
  "status": "ok",
  "data": {
    "id": "36271",
    "datafarm_key": "DF_KEY",
    "datamart_id": "DATAMART_ID",
    "document_type": "USER_PROFILE",
    "mime_type": "APPLICATION_X_NDJSON",
    "encoding": "utf-8",
    "name": "YOUR_DOCUMENT_IMPORT_NAME",
    "priority": "MEDIUM"
  }
}

Create a document import

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports

Path Parameters

Name
Type
Description

datamartId

integer

vFS9BHjabZyJ

Request Body

Name
Type
Description

data

object

kS9H3hzUiexv

Here is a sample request using curl:

curl -X POST \
  "https://api.mediarithmics.com/v1/datamarts/<DATAMART_ID>/document_imports"
  -H 'Authorization: <YOUR_API_TOKEN>'
  -H 'Content-Type: application/json'
  -d '{
          "document_type": "USER_ACTIVITY",
          "mime_type": "APPLICATION_X_NDJSON",
          "encoding": "utf-8",
          "name": "<YOUR_DOCUMENT_IMPORT_NAME>"
      }'

List document imports

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports

List document imports

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports

You can list all document imports for a datamart or search them with filters.

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

Query Parameters

Name
Type
Description

keywords

string

The keywords to match with document import names. It is case sensitive.Examples:

mime_type

string

Filter on a specific mime type. Supported values are APPLICATION_X_NDJSON or TEXT_CSV .

document_types

string

Filter on specific document types. Supported values areUSER_PROFILE, USER_ACTIVITY or USER_SEGMENT .Multiple filters can be separated with commas.Examples : &document_types=USER_PROFILE or &document_types=USER_PROFILE,USER_ACTIVITY

order_by

string

ID sorts result by default, you can specify &order_by=name to sort them by name

{
  "status": "ok",
  "data": [
    {
      "id": "19538",
      "datafarm_key": "DF_KEY",
      "datamart_id": "DATAMART_ID",
      "document_type": "USER_PROFILE",
      "mime_type": "APPLICATION_X_NDJSON",
      "encoding": "utf-8",
      "name": "December 2020 user profiles",
      "priority": "MEDIUM"
    },
    {
      "id": "19552",
      "datafarm_key": "DF_KEY",
      "datamart_id": "DATAMART_ID",
      "document_type": "USER_PROFILE",
      "mime_type": "APPLICATION_X_NDJSON",
      "encoding": "utf-8",
      "name": "January 2021 user profiles",
      "priority": "MEDIUM"
    },
    {
      "id": "19553",
      "datafarm_key": "DF_EU_2020_02",
      "datamart_id": "1509",
      "document_type": "USER_PROFILE",
      "mime_type": "APPLICATION_X_NDJSON",
      "encoding": "utf-8",
      "name": "February 2021 user profiles",
      "priority": "MEDIUM"
    }
  ],
  "count": 3,
  "total": 3,
  "first_result": 0,
  "max_result": 50,
  "max_results": 50
}

Path Parameters

Name
Type
Description

datamartId

integer

cwpZwdgiP6it

Query Parameters

Name
Type
Description

keywords

string

iESSMnU2pCGe

mime_type

string

2CBl6APLVp68

document_types

string

1kwp9UODA97H

order_by

string

EelerXhwvEEp

The query is paginated as described in using our API guide.

Get a document import

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

importId

integer

The ID of the document import

{
  "status": "ok",
  "data": {
    "id": "36271",
    "datafarm_key": "DF_KEY",
    "datamart_id": "DATAMART_ID",
    "document_type": "USER_PROFILE",
    "mime_type": "APPLICATION_X_NDJSON",
    "encoding": "utf-8",
    "name": "December 2020 user profiles",
    "priority": "MEDIUM"
  }
}

Update a document import

PUT https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

importId

integer

The ID of the document import

Request Body

Name
Type
Description

data

object

The document import object to put

{
  "status": "ok",
  "data": {
    "id": "36271",
    "datafarm_key": "DF_KEY",
    "datamart_id": "DATAMART_ID",
    "document_type": "USER_PROFILE",
    "mime_type": "APPLICATION_X_NDJSON",
    "encoding": "utf-8",
    "name": "YOUR_DOCUMENT_IMPORT_NAME",
    "priority": "MEDIUM"
  }
}

Remove a document import

DELETE https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Get a document import

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Path Parameters

Name
Type
Description

datamartId

integer

Jf2ae4145J8X

importId

integer

BnVMEtE0ZgX7

Update a document import

PUT https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Path Parameters

Name
Type
Description

datamartId

integer

FXGW8yZbdRjn

importId

integer

gxho9xxHz3XF

Request Body

Name
Type
Description

data

object

pohTPPcdUrvH

Remove a document import

DELETE https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId

Removes a document import you don't want to see anymore in the system.

Path Parameters

Name
Type
Description

datamartId

integer

The ID of the datamart

importId

integer

The ID of the document import

Path Parameters

Name
Type
Description

datamartId

integer

wpkAxdO7r6EQ

importId

integer

Ie4UyLrAbXVx

File upload

A file upload creates an execution.

After creation, the execution is at the PENDING status. It goes into the RUNNING status when the import starts and SUCCEEDED status once the platform has correctly imported the file.

Create an execution

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions

Create an execution

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions

You create an execution and upload a file with this endpoint.

Path Parameters

Name
Type
Description

datamartId*

string

The ID of the datamart

importId*

string

The ID of the document import

Headers

Name
Type
Description

Content-Type*

string

Your upload configuration.

{
    "status": "ok",
    "data": {
        "parameters": null,
        "result": null,
        "error": null,
        "id": "11597785",
        "status": "PENDING",
        "creation_date": 1609410143659,
        "start_date": null,
        "duration": null,
        "organisation_id": "1426",
        "user_id": null,
        "cancel_status": null,
        "debug": null,
        "is_retryable": false,
        "permalink_uri": "MTowOjA6NDI1MzAxMg==",
        "num_tasks": null,
        "completed_tasks": null,
        "erroneous_tasks": null,
        "retry_count": 0,
        "job_type": "DOCUMENT_IMPORT",
        "import_mode": "MANUAL_FILE",
        "import_type": null
    }
}

Path Parameters

Name
Type
Description

datamartId*

string

uIp7zTVxrlLN

importId*

string

Bt3q7aQg0FLi

Headers

Name
Type
Description

Content-Type*

string

JxtpyAC9vv4c

See an example:

curl --location --request POST 'https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:executionId/executions/' \
--header 'Content-Type: application/x-ndjson; \
--header 'Authorization: api:TOKEN' \
--data-binary '@/Users/username/path/to/the/file.ndjson'

You retrieve metadata about the created execution, notably and id property you can use to track the execution.

List executions

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions

List executions

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions

You can list all executions for a document, import and retrieve useful data like their status, execution time and error messages.

Path Parameters

Name
Type
Description

datamartId*

integer

The ID of the datamart

importId*

integer

The ID of document import

{
    "status": "ok",
    "data": [
        {
            "parameters": {
                "datamart_id": 1609,
                "document_import_id": 19718,
                "mime_type": "APPLICATION_X_NDJSON",
                "document_type": "USER_PROFILE",
                "input_file_name": "requestBody9664967795462448677asRaw",
                "file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
                "number_of_lines": 4,
                "segment_id": null
            },
            "result": {
                "total_success": 4,
                "total_failure": 0,
                "input_file_name": "requestBody9664967795462448677asRaw",
                "input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
                "error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y_errors.csv",
                "possible_issue_on_identifiers": false,
                "top_identifiers": {}
            },
            "error": null,
            "id": "11597785",
            "status": "SUCCEEDED",
            "creation_date": 1609410143659,
            "start_date": 1609410150976,
            "duration": 3059,
            "organisation_id": "1426",
            "user_id": null,
            "cancel_status": null,
            "debug": null,
            "is_retryable": false,
            "permalink_uri": "MTowOjA6NDI1MzAxMg==",
            "num_tasks": 4,
            "completed_tasks": 4,
            "erroneous_tasks": 0,
            "retry_count": 0,
            "job_type": "DOCUMENT_IMPORT",
            "import_mode": "MANUAL_FILE",
            "import_type": null,
            "end_date": 1609410154035
        },
        {
            "parameters": {
                "datamart_id": 1609,
                "document_import_id": 19718,
                "mime_type": "APPLICATION_X_NDJSON",
                "document_type": "USER_PROFILE",
                "input_file_name": "requestBody17471990940413569967asRaw",
                "file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu.json",
                "number_of_lines": 4,
                "segment_id": null
            },
            "result": {
                "total_success": 0,
                "total_failure": 4,
                "input_file_name": "requestBody17471990940413569967asRaw",
                "input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu.json",
                "error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu_errors.csv",
                "possible_issue_on_identifiers": false,
                "top_identifiers": {}
            },
            "error": {
                "message": "0 success, 4 failures\nSaved errors:\nNo profile id found while upserting a user profile Error id = 9d5016ea-6b7b-4c64-bc74-60ba207e3bed.\nNo profile id found while upserting a user profile Error id = 99f8d9bb-4c94-49ea-8bb2-934bc6056cac.\nNo profile id found while upserting a user profile Error id = d1216b0e-619c-4d92-9098-cc5ae4ac8e16.\nNo profile id found while upserting a user profile Error id = a92d3258-163c-4b9d-949e-94f9006cd77d.\n"
            },
            "id": "11170897",
            "status": "SUCCEEDED",
            "creation_date": 1603101286198,
            "start_date": 1603101317674,
            "duration": 1062,
            "organisation_id": "1426",
            "user_id": null,
            "cancel_status": null,
            "debug": null,
            "is_retryable": false,
            "permalink_uri": "MTowOjA6MzgyNjEyNA==",
            "num_tasks": 4,
            "completed_tasks": 0,
            "erroneous_tasks": 4,
            "retry_count": 0,
            "job_type": "DOCUMENT_IMPORT",
            "import_mode": "MANUAL_FILE",
            "import_type": null,
            "end_date": 1603101318736
        }
    ],
    "count": 2,
    "total": 2,
    "first_result": 0,
    "max_result": 50,
    "max_results": 50
}

Get an execution

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId

Path Parameters

Name
Type
Description

datamartId*

integer

azzxsgagT62l

importId*

integer

txRL6gwJbd3Q

Get an execution

GET https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId

Get a specific execution and retrieves useful data like its status, execution time and error messages.

Path Parameters

Name
Type
Description

datamartId*

integer

The ID of the datamart

importId*

integer

The ID of the document import

executionId*

integer

The ID of the execution (usually retrieved from "create execution" or "list executions" requests)

{
    "status": "ok",
    "data": {
        "parameters": {
            "datamart_id": 1609,
            "document_import_id": 19718,
            "mime_type": "APPLICATION_X_NDJSON",
            "document_type": "USER_PROFILE",
            "input_file_name": "requestBody9664967795462448677asRaw",
            "file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
            "number_of_lines": 4,
            "segment_id": null
        },
        "result": {
            "total_success": 4,
            "total_failure": 0,
            "input_file_name": "requestBody9664967795462448677asRaw",
            "input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
            "error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y_errors.csv",
            "possible_issue_on_identifiers": false,
            "top_identifiers": {}
        },
        "error": null,
        "id": "11597785",
        "status": "SUCCEEDED",
        "creation_date": 1609410143659,
        "start_date": 1609410150976,
        "duration": 3059,
        "organisation_id": "1426",
        "user_id": null,
        "cancel_status": null,
        "debug": null,
        "is_retryable": false,
        "permalink_uri": "MTowOjA6NDI1MzAxMg==",
        "num_tasks": 4,
        "completed_tasks": 4,
        "erroneous_tasks": 0,
        "retry_count": 0,
        "job_type": "DOCUMENT_IMPORT",
        "import_mode": "MANUAL_FILE",
        "import_type": null,
        "end_date": 1609410154035
    }
}

Cancel an execution

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId/action

Path Parameters

Name
Type
Description

datamartId*

integer

Zn9RFSaLWpZv

importId*

integer

s6uYmaa8r9i1

executionId*

integer

B64v3BHt293F

Cancel an execution

POST https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId/action

Cancel a specific execution

Path Parameters

Name
Type
Description

datamartId*

string

The ID of the datamart

importId*

string

The ID of the document import

executionId*

string

The ID of the execution (usually retrieved from "create execution" or "list executions" requests)

Request Body

Name
Type
Description

body*

json

Must be: {"action":"CANCEL"}

{
  "status": "ok",
  "data": {
    "parameters": null,
    "result": null,
    "error": null,
    "id": "22747195",
    "status": "CANCELED",
    "creation_date": 1646060596034,
    "start_date": null,
    "duration": null,
    "organisation_id": "1581",
    "user_id": null,
    "cancel_status": "REQUESTED",
    "debug": null,
    "is_retryable": false,
    "community_id": "1581",
    "num_tasks": null,
    "completed_tasks": null,
    "erroneous_tasks": null,
    "retry_count": 0,
    "permalink_uri": null,
    "job_type": "DOCUMENT_IMPORT",
    "import_mode": "MANUAL_FILE",
    "import_type": null
  }
}

Path Parameters

Name
Type
Description

datamartId*

string

wZoHlE7fLL7B

importId*

string

rfyxTmy4ADfu

executionId*

string

I4uybZx0Wdsn

Request Body

Name
Type
Description

body*

json

h71PehFUA6m5

The cancellation of an execution will only work if the status of this executions is "PENDING"

Splitting large files

If you need to import larger files than 100Mbytes, you can split them before using the upload API and call it multiple times.

You can split massive files using the shell command.

split -l <LINE_NUMBER> ./your/file/path

Last updated