Bulk processing
Bulk import aims at giving you the ability to bulk-import data into the mediarithmics platform.
You can import:
Offline activities such as offline purchases and store visits
User segments such as email lists, cookies list, user accounts list, etc.
User profiles such as CRM data and scoring
User association such as CRM Onboarding
User dissociation
User suppression requests such as GDPR Suppression requests, and Opt-Out Management
How it works
You upload files associated with a document import definition:
Files represent the data.
Document imports represent what mediarithmics should do with the data.
If you need to track users in real-time, you should read the real-time tracking guide.
The two steps for bulk import are:
Create the document import definition to tell mediarithmics what you are importing
Upload files associated with the document import definition. Each uploaded file creates a new document import execution.
How to choose between creating a new document import or adding a new file to an existing document import? Our recommendation is to create a new document import each time you have a new set of files to upload. For example, if you upload CRM profiles every night, you should create a new "User profiles from CRM - " document import every night instead of just uploading new files to a unique "User profiles from CRM" document import.
Each line in the uploaded file is a command to execute. Depending on the document import type, you have different commands available.
User identifiers in imports
When importing data, you need to properly add user identifiers. This will ensure your data is associated with the proper UserPoint.
Only one identifier is allowed per line. For example, you shouldn't specify the user agent ID if the Email Hash is already used in a line.
However, you don't have to always use the same type of identifier in your document. For example, one line could use the user account ID while another uses the email hash.
Document import
Document imports define what you are about to upload in one or multiple files.
A document import object has the following properties:
field
type
description
document_type
Enum
The type of data you want to import. Should be USER_ACTIVITY
, USER_SEGMENT
, USER_PROFILE
,
USER_CHOICE
,
USER_IDENTIFIERS_DELETION
, or USER_IDENTIFIERS_ASSOCIATION_DECLARATIONS
mime_type
Enum
The format of the imported data. APPLICATION_X_NDJSON
or TEXT_CSV
It should match the file format of the upload file, e.g. .csv
or .ndjson
. The csv format can be chosen only for USER_SEGMENT
imports.
encoding
String
Encoding of the data that will be imported. Usuallyutf-8
name
String
The name of your import.
priority
Enum
LOW
, MEDIUM
or HIGH
// Sample document import object
{
"document_type": "USER_ACTIVITY",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "<YOUR_DOCUMENT_IMPORT_NAME>"
}
Create a document import
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports
Path Parameters
datamartId
integer
The ID of the datamart in which your data will be imported
Request Body
data
object
The document import object you wish to create
Response:
{
"status": "ok",
"data": {
"id": "36271",
"datafarm_key": "DF_KEY",
"datamart_id": "DATAMART_ID",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "YOUR_DOCUMENT_IMPORT_NAME",
"priority": "MEDIUM"
}
}
Here is a sample request using curl:
curl -X POST \
"https://api.mediarithmics.com/v1/datamarts/<DATAMART_ID>/document_imports"
-H 'Authorization: <YOUR_API_TOKEN>'
-H 'Content-Type: application/json'
-d '{
"document_type": "USER_ACTIVITY",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "<YOUR_DOCUMENT_IMPORT_NAME>"
}'
List document imports
GET
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports
You can list all document imports for a datamart or search them with filters.
Path Parameters
datamartId
integer
The ID of the datamart
Query Parameters
keywords
string
The keywords to match with document import names. It is case sensitive.Examples:
mime_type
string
Filter on a specific mime type. Supported values are APPLICATION_X_NDJSON
or TEXT_CSV
.
document_types
string
Filter on specific document types. Supported values areUSER_PROFILE
, USER_ACTIVITY
or USER_SEGMENT
.Multiple filters can be separated with commas.Examples : &document_types=USER_PROFILE
or &document_types=USER_PROFILE,USER_ACTIVITY
order_by
string
ID sorts result by default, you can specify &order_by=name
to sort them by name
The query is paginated as described in using our API guide.
{
"status": "ok",
"data": [
{
"id": "19538",
"datafarm_key": "DF_KEY",
"datamart_id": "DATAMART_ID",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "December 2020 user profiles",
"priority": "MEDIUM"
},
{
"id": "19552",
"datafarm_key": "DF_KEY",
"datamart_id": "DATAMART_ID",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "January 2021 user profiles",
"priority": "MEDIUM"
},
{
"id": "19553",
"datafarm_key": "DF_EU_2020_02",
"datamart_id": "1509",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "February 2021 user profiles",
"priority": "MEDIUM"
}
],
"count": 3,
"total": 3,
"first_result": 0,
"max_result": 50,
"max_results": 50
}
Get a document import
GET
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId
Path Parameters
datamartId
integer
The ID of the datamart
importId
integer
The ID of the document import
{
"status": "ok",
"data": {
"id": "36271",
"datafarm_key": "DF_KEY",
"datamart_id": "DATAMART_ID",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "December 2020 user profiles",
"priority": "MEDIUM"
}
}
Update a document import
PUT
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId
Path Parameters
datamartId
integer
The ID of the datamart
importId
integer
The ID of the document import
Request Body
data
object
The document import object to put
{
"status": "ok",
"data": {
"id": "36271",
"datafarm_key": "DF_KEY",
"datamart_id": "DATAMART_ID",
"document_type": "USER_PROFILE",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "YOUR_DOCUMENT_IMPORT_NAME",
"priority": "MEDIUM"
}
}
Remove a document import
DELETE
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId
Removes a document import you don't want to see anymore in the system.
Path Parameters
datamartId
integer
The ID of the datamart
importId
integer
The ID of the document import
File upload
A file upload creates an execution.
After creation, the execution is at the PENDING
status. It goes into the RUNNING
status when the import starts and SUCCEEDED
status once the platform has correctly imported the file.
Create an execution
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions
You create an execution and upload a file with this endpoint.
Path Parameters
datamartId*
string
The ID of the datamart
importId*
string
The ID of the document import
Headers
Content-Type*
string
Your upload configuration.
{
"status": "ok",
"data": {
"parameters": null,
"result": null,
"error": null,
"id": "11597785",
"status": "PENDING",
"creation_date": 1609410143659,
"start_date": null,
"duration": null,
"organisation_id": "1426",
"user_id": null,
"cancel_status": null,
"debug": null,
"is_retryable": false,
"permalink_uri": "MTowOjA6NDI1MzAxMg==",
"num_tasks": null,
"completed_tasks": null,
"erroneous_tasks": null,
"retry_count": 0,
"job_type": "DOCUMENT_IMPORT",
"import_mode": "MANUAL_FILE",
"import_type": null
}
}
See an example:
curl --location --request POST 'https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:executionId/executions/' \
--header 'Content-Type: application/x-ndjson; \
--header 'Authorization: api:TOKEN' \
--data-binary '@/Users/username/path/to/the/file.ndjson'
You retrieve metadata about the created execution, notably and id property you can use to track the execution.
List executions
GET
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions
You can list all executions for a document, import and retrieve useful data like their status, execution time and error messages.
Path Parameters
datamartId*
integer
The ID of the datamart
importId*
integer
The ID of document import
{
"status": "ok",
"data": [
{
"parameters": {
"datamart_id": 1609,
"document_import_id": 19718,
"mime_type": "APPLICATION_X_NDJSON",
"document_type": "USER_PROFILE",
"input_file_name": "requestBody9664967795462448677asRaw",
"file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
"number_of_lines": 4,
"segment_id": null
},
"result": {
"total_success": 4,
"total_failure": 0,
"input_file_name": "requestBody9664967795462448677asRaw",
"input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
"error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y_errors.csv",
"possible_issue_on_identifiers": false,
"top_identifiers": {}
},
"error": null,
"id": "11597785",
"status": "SUCCEEDED",
"creation_date": 1609410143659,
"start_date": 1609410150976,
"duration": 3059,
"organisation_id": "1426",
"user_id": null,
"cancel_status": null,
"debug": null,
"is_retryable": false,
"permalink_uri": "MTowOjA6NDI1MzAxMg==",
"num_tasks": 4,
"completed_tasks": 4,
"erroneous_tasks": 0,
"retry_count": 0,
"job_type": "DOCUMENT_IMPORT",
"import_mode": "MANUAL_FILE",
"import_type": null,
"end_date": 1609410154035
},
{
"parameters": {
"datamart_id": 1609,
"document_import_id": 19718,
"mime_type": "APPLICATION_X_NDJSON",
"document_type": "USER_PROFILE",
"input_file_name": "requestBody17471990940413569967asRaw",
"file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu.json",
"number_of_lines": 4,
"segment_id": null
},
"result": {
"total_success": 0,
"total_failure": 4,
"input_file_name": "requestBody17471990940413569967asRaw",
"input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu.json",
"error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody17471990940413569967asRaw-2020-10-19_09.54.45-JvP1ssxKSu_errors.csv",
"possible_issue_on_identifiers": false,
"top_identifiers": {}
},
"error": {
"message": "0 success, 4 failures\nSaved errors:\nNo profile id found while upserting a user profile Error id = 9d5016ea-6b7b-4c64-bc74-60ba207e3bed.\nNo profile id found while upserting a user profile Error id = 99f8d9bb-4c94-49ea-8bb2-934bc6056cac.\nNo profile id found while upserting a user profile Error id = d1216b0e-619c-4d92-9098-cc5ae4ac8e16.\nNo profile id found while upserting a user profile Error id = a92d3258-163c-4b9d-949e-94f9006cd77d.\n"
},
"id": "11170897",
"status": "SUCCEEDED",
"creation_date": 1603101286198,
"start_date": 1603101317674,
"duration": 1062,
"organisation_id": "1426",
"user_id": null,
"cancel_status": null,
"debug": null,
"is_retryable": false,
"permalink_uri": "MTowOjA6MzgyNjEyNA==",
"num_tasks": 4,
"completed_tasks": 0,
"erroneous_tasks": 4,
"retry_count": 0,
"job_type": "DOCUMENT_IMPORT",
"import_mode": "MANUAL_FILE",
"import_type": null,
"end_date": 1603101318736
}
],
"count": 2,
"total": 2,
"first_result": 0,
"max_result": 50,
"max_results": 50
}
Get an execution
GET
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId
Get a specific execution and retrieves useful data like its status, execution time and error messages.
Path Parameters
datamartId*
integer
The ID of the datamart
importId*
integer
The ID of the document import
executionId*
integer
The ID of the execution (usually retrieved from "create execution" or "list executions" requests)
{
"status": "ok",
"data": {
"parameters": {
"datamart_id": 1609,
"document_import_id": 19718,
"mime_type": "APPLICATION_X_NDJSON",
"document_type": "USER_PROFILE",
"input_file_name": "requestBody9664967795462448677asRaw",
"file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
"number_of_lines": 4,
"segment_id": null
},
"result": {
"total_success": 4,
"total_failure": 0,
"input_file_name": "requestBody9664967795462448677asRaw",
"input_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y.json",
"error_file_uri": "mics://data_file/tenants/1426/datamarts/1509/document_imports/19518/requestBody9664967795462448677asRaw-2020-12-31_10.22.23-KzgivDim3y_errors.csv",
"possible_issue_on_identifiers": false,
"top_identifiers": {}
},
"error": null,
"id": "11597785",
"status": "SUCCEEDED",
"creation_date": 1609410143659,
"start_date": 1609410150976,
"duration": 3059,
"organisation_id": "1426",
"user_id": null,
"cancel_status": null,
"debug": null,
"is_retryable": false,
"permalink_uri": "MTowOjA6NDI1MzAxMg==",
"num_tasks": 4,
"completed_tasks": 4,
"erroneous_tasks": 0,
"retry_count": 0,
"job_type": "DOCUMENT_IMPORT",
"import_mode": "MANUAL_FILE",
"import_type": null,
"end_date": 1609410154035
}
}
Cancel an execution
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId/action
Cancel a specific execution
Path Parameters
datamartId*
string
The ID of the datamart
importId*
string
The ID of the document import
executionId*
string
The ID of the execution (usually retrieved from "create execution" or "list executions" requests)
Request Body
body*
json
Must be: {"action":"CANCEL"}
{
"status": "ok",
"data": {
"parameters": null,
"result": null,
"error": null,
"id": "22747195",
"status": "CANCELED",
"creation_date": 1646060596034,
"start_date": null,
"duration": null,
"organisation_id": "1581",
"user_id": null,
"cancel_status": "REQUESTED",
"debug": null,
"is_retryable": false,
"community_id": "1581",
"num_tasks": null,
"completed_tasks": null,
"erroneous_tasks": null,
"retry_count": 0,
"permalink_uri": null,
"job_type": "DOCUMENT_IMPORT",
"import_mode": "MANUAL_FILE",
"import_type": null
}
}
The cancellation of an execution will only work if the status of this executions is "PENDING"
Splitting large files
If you need to import larger files than 100Mbytes, you can split them before using the upload API and call it multiple times.
You can split massive files using the shell command.
split -l <LINE_NUMBER> ./your/file/path
Last updated
Was this helpful?