Cohort-based Lookalike setup
Please refer to our user guide documentation to learn more about this feature.

Features definition

Features

A designated cohort is assigned to a user depending of the features that you have defined to characterize your users. Once you have selected features, you need to format them in a JSON object.
For instance, let's imagine that you want to create cohorts based on os_family, form_factor & code_geo. The first 2 features are defined on UserAgentInfo (nested in UserAgent) and the 3rd one on UserEvent object. You will then define the following JSON:
[
{
"type": "FREQUENCY_ENUM",
"field_path": "agents.user_agent_info.os_family",
"values": [
"OTHER",
"WINDOWS",
"MAC_OS",
"LINUX",
"ANDROID",
"IOS"
]
},
{
"type": "FREQUENCY_ENUM",
"field_path": "agents.user_agent_info.form_factor",
"values": [
"OTHER",
"PERSONAL_COMPUTER",
"SMART_TV",
"GAME_CONSOLE",
"SMARTPHONE",
"TABLET",
"WEARABLE_COMPUTER"
]
},
{
"type": "FREQUENCY_TEXT",
"field_path": "events.code_geo_dep",
"vector_size": 100
}
]
Contact your CSM to learn more about available types and how to build an efficient cohort features system.

GraphQL Query

A ML function requires a query to fetch data used in its configuration. In the case of the simhash ML function, it requires the appropriate query to fetch fields used as features and specified in the JSON Features.
Following our previous example, the graphQL query will be :
{agents {user_agent_info {os_family form_factor}} events{code_geo}}

ML function instantiation

Please follow the next steps to instantiate the ML function developed by mediarithmics to assign a cohort to your userpoints:
  1. 1.
    Head to Settings > Datamart > ML Functions
  2. 2.
    Click on New Ml Function, pick the datamart where to apply the ML function then choose simhash-cohorts-calulation
  3. 3.
    Enter the following information on the ML function configuration panel:
    • General Informations
      • Name: Cohort ML Function
      • Hosting Object Type: UserPoint
      • Field Type Name: ClusteringCohort
      • Field Name: clustering_cohort
      • Query: <Insert here the graphQL query that need to be run to extract features used to calculate your cohort>
    • Properties
      • Features: <Insert here the features JSON>
      • Cohort Id Bit Size: <Wil be used to define number of cohorts in your datamart as 2^(Cohort Id Bit Size)>
  4. 4.
    Click on Save button

Schema update

Two changes have to be made in your runtime schema :
  • Add a field clustering_cohort in UserPoint as follow :
type UserPoint @TreeIndexRoot(index:"USER_INDEX") {
...
clustering_cohort:ClusteringCohort
...
}
  • Create a new ClusertingCohort type as follow :
type ClusteringCohort {
id:ID! @TreeIndex(index:"USER_INDEX")
expiration_ts:Timestamp @TreeIndex(index:"USER_INDEX")
instance_id:String! @TreeIndex(index:"USER_INDEX")
last_modified_ts:Timestamp! @TreeIndex(index:"USER_INDEX")
}
Don't hesitate to have a look at schema update documentation to learn more about how to update your schema.

ML function activation

Once the ML function has been instantiated and the run time schema updated, you will need to update batch_mode parameter to true and activate the ML function by running the following API :
PUT https://api.mediarithmics.com/v1/ml_functions/<id_ml_function>
{
"batch_mode": true,
"status": "ACTIVE"
}