Bulk import
Bulk import aims at giving you the ability to bulk-import data into the mediarithmics platform.
You can import:
- User segments such as email lists, cookies list, user accounts list, etc.
- User profiles such as CRM data and scoring
- User association such as CRM Onboarding
- User dissociation
- User suppression requests such as GDPR Suppression requests, and Opt-Out Management
You upload files associated with a document import definition:
- Files represent the data.
- Document imports represent what mediarithmics should do with the data.
The two steps for bulk import are:
- 1.Create the document import definition to tell mediarithmics what you are importing
- 2.Upload files associated with the document import definition. Each uploaded file creates a new document import execution.
For maximum performance:
- Ensure a maximum size for each file of 100M.
- Use the document import for multiple records when there will be more than 1,000 per file.
How to choose between creating a new document import or adding a new file to an existing document import? Our recommendation is to create a new document import each time you have a new set of files to upload. For example, if you upload CRM profiles every night, you should create a new "User profiles from CRM - " document import every night instead of just uploading new files to a unique "User profiles from CRM" document import.
Each line in the uploaded file is a command to execute. Depending on the document import type, you have different commands available.
When importing data, you need to properly add user identifiers. This will ensure your data is associated with the proper user point.
Only one identifier is allowed per line. For example, you shouldn't specify the user agent ID if the Email Hash is already used in a line.
However, you don't have to always use the same type of identifier in your document. For example, one line could use the user account ID while another uses the email hash.
Document imports define what you are about to upload in one or multiple files.
A document import object has the following properties:
field | type | description |
document_type | Enum | The type of data you want to import. Should be USER_ACTIVITY , USER_SEGMENT , USER_PROFILE , USER_CHOICE , USER_IDENTIFIERS_DELETION , or USER_IDENTIFIERS_ASSOCIATION_DECLARATIONS |
mime_type | Enum | The format of the imported data. APPLICATION_X_NDJSON or TEXT_CSV It should match the file format of the upload file, e.g. .csv or .ndjson . The csv format can be chosen only for USER_SEGMENT imports. |
encoding | String | Encoding of the data that will be imported. Usually utf-8 |
name | String | The name of your import. |
priority | Enum | LOW , MEDIUM or HIGH |
// Sample document import object
{
"document_type": "USER_ACTIVITY",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "<YOUR_DOCUMENT_IMPORT_NAME>"
}
post
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports
Create a document import
Here is a sample request using curl:
curl -X POST \
"https://api.mediarithmics.com/v1/datamarts/<DATAMART_ID>/document_imports"
-H 'Authorization: <YOUR_API_TOKEN>'
-H 'Content-Type: application/json'
-d '{
"document_type": "USER_ACTIVITY",
"mime_type": "APPLICATION_X_NDJSON",
"encoding": "utf-8",
"name": "<YOUR_DOCUMENT_IMPORT_NAME>"
}'
get
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports
List document imports
get
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId
Get a document import
put
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId
Update a document import
delete
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId
Remove a document import
A file upload creates an execution.
After creation, the execution is at the
PENDING
status. It goes into the RUNNING
status when the import starts and SUCCEEDED
status once the platform has correctly imported the file.post
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId/executions
Create an execution
See an example:
curl --location --request POST 'https://api.mediarithmics.com/v1/datamarts/:datamartId/document_imports/:executionId/executions/' \
--header 'Content-Type: application/x-ndjson; \
--header 'Authorization: api:TOKEN' \
--data-binary '@/Users/username/path/to/the/file.ndjson'
You retrieve metadata about the created execution, notably and id property you can use to track the execution.
get
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId/executions
List executions
get
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId
Get an execution
post
https://api.mediarithmics.com
/v1/datamarts/:datamartId/document_imports/:importId/executions/:executionId/action
Cancel an execution
The cancellation of an execution will only work if the status of this executions is "PENDING"
If you need to import larger files than 100Mbytes, you can split them before using the upload API and call it multiple times.
You can split massive files using the shell command.
split -l <LINE_NUMBER> ./your/file/path
Last modified 1yr ago