Cohort-based Lookalike

circle-info

Please refer to our user guide documentationarrow-up-right to learn more about this feature.

circle-exclamation

Attributes definition

Attributes

Attributes selection

A designated cohort is assigned to a user depending on attributes (also named features in DataScience) you have defined to characterize your users. You will need to format those attributes using JSON format (see below).

circle-info

We recommend to :

  • Pick attributes that can be used to segment users and that are relevant to the business

  • Pick attributes that are available on all your users (logged / unlogged)

  • Pick attributes from various typology (UserEvent, UserProfile, …)

  • Select between 3 & 10 attributes

  • Have between 50 & 300 values of attributes (from all various attributes)

  • Keep the default of 1024 cohorts (Cohort Id Bit Size = 10, see below for more information about this)

JSON format

For instance, let's imagine that you want to create cohorts based on:

  • os_family - defined on UserAgentInfo nested in UserAgent

  • age - defined on UserProfile

  • city - defned on UserEvent

You will therefore define the following JSON:

Configuration help

There are 3 types of attributes available:

  • FREQUENCY_ENUM: use this type for a finite list of values like operating systems.

  • FREQUENCY_NUMBER: use this type for classifying number buckets like age. Using the above example:

    • First bucket: >= 0 & < 10

    • Second bucket: >= 10 & < 100

    • Third bucket: anything that didn't fell into the 2 defined buckets

  • FREQUENCY_TEXT: use this type an infinite (or long) liste of values like keywords, cities, ... Choose wisely the vector_size parameter as it will be used as a modulo on values to reduce the disparity of values to a fixed number

The field_path must contain the path of the attribute from the UserPoint definition (see schema documentationarrow-up-right for more info)

GraphQL Query

A ML function requires a query to fetch data used in its configuration. In the case of cohort-based lookalike, it requires an appropriate query to fetch fields used as attributes and specified in the JSON.

Following our previous example, the graphQL query will be :

ML function instantiation

Please follow the next steps to instantiate the ML function developed by mediarithmics to assign a cohort to your userpoints:

  1. Head to Settings > Datamart > ML Functions

  2. Click on New Ml Function, pick the datamart where to apply the ML function then choose simhash-cohorts-calulation

  3. Enter the following information on the ML function configuration panel:

    • General Informations

      • Name: Cohort ML Function

      • Hosting Object Type: UserPoint

      • Field Type Name: ClusteringCohort

      • Field Name: clustering_cohort

      • Query: <Insert here the graphQL query that need to be run to extract attributes used to calculate your cohort>

    • Properties

      • Features: <Insert here the one-line JSON>

      • Cohort Id Bit Size: <Wil be used to define number of cohorts in your datamart as 2^(Cohort Id Bit Size)>

  4. Click on Save button

circle-exclamation

ML function

Activation

Once the ML function has been instantiated, you will need to update batch_mode parameter to true and activate the ML function by running the following API :

Schema update

Two changes have to be made in your runtime schema :

  • Add a field clustering_cohort in UserPoint as follow :

  • Create a new ClusertingCohort type as follow :

circle-check

Initial loading

You can ask your Account manager to run an initial loading on your datamart to calculate cohorts on existing userpoints.

Last updated

Was this helpful?