Cohort-based Lookalike
Last updated
Was this helpful?
Last updated
Was this helpful?
To configure this feature, please follow the next steps IN ORDER:
Attributes definition
ML function creation
ML function activation
Schema update
ML function initial loading
A designated cohort is assigned to a user depending on attributes (also named features in DataScience) you have defined to characterize your users. You will need to format those attributes using JSON format (see below).
For instance, let's imagine that you want to create cohorts based on:
os_family - defined on UserAgentInfo nested in UserAgent
age - defined on UserProfile
city - defned on UserEvent
You will therefore define the following JSON:
There are 3 types of attributes available:
FREQUENCY_ENUM: use this type for a finite list of values like operating systems.
FREQUENCY_NUMBER: use this type for classifying number buckets like age. Using the above example:
First bucket: >= 0 & < 10
Second bucket: >= 10 & < 100
Third bucket: anything that didn't fell into the 2 defined buckets
FREQUENCY_TEXT: use this type an infinite (or long) liste of values like keywords, cities, ... Choose wisely the vector_size parameter as it will be used as a modulo on values to reduce the disparity of values to a fixed number
A ML function requires a query to fetch data used in its configuration. In the case of cohort-based lookalike, it requires an appropriate query to fetch fields used as attributes and specified in the JSON.
Following our previous example, the graphQL query will be :
Please follow the next steps to instantiate the ML function developed by mediarithmics to assign a cohort to your userpoints:
Head to Settings > Datamart > ML Functions
Click on New Ml Function, pick the datamart where to apply the ML function then choose simhash-cohorts-calulation
Enter the following information on the ML function configuration panel:
General Informations
Name: Cohort ML Function
Hosting Object Type: UserPoint
Field Type Name: ClusteringCohort
Field Name: clustering_cohort
Query: <Insert here the graphQL query that need to be run to extract attributes used to calculate your cohort>
Properties
Features: <Insert here the one-line JSON>
Cohort Id Bit Size: <Wil be used to define number of cohorts in your datamart as 2^(Cohort Id Bit Size)>
Click on Save button
Once the ML function has been instantiated and the run time schema updated, you will need to update batch_mode parameter to true and activate the ML function by running the following API :
Two changes have to be made in your runtime schema :
Add a field clustering_cohort in UserPoint as follow :
Create a new ClusertingCohort type as follow :
You can ask your Account manager to run an initial loading on your datamart to calculate cohorts on existing userpoints.
The field_path must contain the path of the attribute from the UserPoint definition (see for more info)
Don't hesitate to have a look at to learn more about how to update your schema.