# Cohort-based Lookalike

{% hint style="info" %}
Please refer to our [user guide documentation](https://userguides.mediarithmics.io/audience/segments/segment-typology/user-lookalike-segment/cohort-based-lookalike) to learn more about this feature.
{% endhint %}

{% hint style="warning" %}
To configure this feature, please follow the next steps **IN ORDER**:

1. Attributes definition
2. ML function creation
3. ML function activation
4. Schema update
5. ML function initial loading
   {% endhint %}

## Attributes definition

### Attributes

#### Attributes selection

A designated cohort is assigned to a user depending on attributes (also named features in DataScience) you have defined to characterize your users. You will need to format those attributes using JSON format (see below).

{% hint style="info" %}
We recommend to :

* Pick attributes that can be used to segment users and that are relevant to the business
* Pick attributes that are available on all your users (logged / unlogged)
* Pick attributes from various typology (UserEvent, UserProfile, …)
* Select between 3 & 10 attributes
* Have between 50 & 300 values of attributes (from all various attributes)
* Keep the default of 1024 cohorts (Cohort Id Bit Size = 10, see below for more information about this)
  {% endhint %}

#### JSON format

For instance, let's imagine that you want to create cohorts based on:

* **os\_family** - defined on **UserAgentInfo** nested in **UserAgent**
* **age** - defined on **UserProfile**
* **city** - defned on **UserEvent**

You will therefore define the following JSON:

```json
[
  {
    "type": "FREQUENCY_ENUM",
    "field_path": "agents.user_agent_info.os_family",
    "values": [
      "OTHER",
      "WINDOWS",
      "MAC_OS",
      "LINUX",
      "ANDROID",
      "IOS"
    ]
  },
  {
    "type": "FREQUENCY_NUMBER",
    "field_path": "profiles.age",
    "intervals": [
      {
        "from": 0,
        "to": 10
      },
      {
        "from": 10,
        "to": 100
      }
    ]
  },
  {
    "type": "FREQUENCY_TEXT",
    "field_path": "events.city",
    "vector_size": 100
  }
]
```

#### Configuration help

There are 3 types of attributes available:

* **FREQUENCY\_ENUM**: use this type for a finite list of values like operating systems.
* **FREQUENCY\_NUMBER**: use this type for classifying number buckets like age. Using the above example:
  * First bucket: >= 0 & < 10&#x20;
  * Second bucket: >= 10 & < 100
  * Third bucket: anything that didn't fell into the 2 defined buckets
* **FREQUENCY\_TEXT**: use this type an infinite (or long) liste of values like keywords, cities, ... Choose wisely the **vector\_size** parameter as it will be used as a modulo on values to reduce the disparity of values to a fixed number

The **field\_path** must contain the path of the attribute from the UserPoint definition (see [schema documentation](https://developer.mediarithmics.io/schema) for more info)

### **GraphQL Query**

A ML function requires a query to fetch data used in its configuration. In the case of cohort-based lookalike, it requires an appropriate query to fetch fields used as attributes and specified in the JSON.

Following our previous example, the graphQL query will be :&#x20;

```graphql
{agents {user_agent_info {os_family}} profiles {age} events{city}}
```

## ML function instantiation

Please follow the next steps to instantiate the ML function developed by mediarithmics to assign a cohort to your userpoints:

1. Head to Settings > Datamart > ML Functions
2. Click on **New Ml Function**, pick the datamart where to apply the ML function then choose **simhash-cohorts-calulation**
3. Enter the following information on the ML function configuration panel:
   * General Informations
     * Name: **Cohort ML Function**
     * Hosting Object Type: **UserPoint**
     * Field Type Name: **ClusteringCohort**
     * Field Name: **clustering\_cohort**
     * Query: *\<Insert here the graphQL query that need to be run to extract attributes used to calculate your cohort>*
   * Properties
     * Features: *\<Insert here the one-line JSON>*
     * Cohort Id Bit Size: *\<Wil be used to define number of cohorts in your datamart as 2^(Cohort Id Bit Size)>*
4. Click on **Save** button

{% hint style="warning" %}
Note that only one Cohort-based Lookalike model can be set up at a time in an organisation.
{% endhint %}

## ML function&#x20;

### Activation

Once the ML function has been instantiated, you will need to update **batch\_mode** parameter to **true** and **activate the ML function** by running the following API :&#x20;

```json
PUT https://api.mediarithmics.com/v1/ml_functions/<id_ml_function>
 {
  "batch_mode": true,
  "status": "ACTIVE"
 }
```

## Schema update

Two changes have to be made in your runtime schema :

* Add a field **clustering\_cohort** in **UserPoint** as follow :

```graphql
type UserPoint  @TreeIndexRoot(index:"USER_INDEX") {
   ...
   clustering_cohort:ClusteringCohort
   ...
}
```

* Create a new **ClusertingCohort** type as follow :

```graphql
type ClusteringCohort  {
   id:ID! @TreeIndex(index:"USER_INDEX")
   expiration_ts:Timestamp @TreeIndex(index:"USER_INDEX")
   cohort_id:String! @TreeIndex(index:"USER_INDEX")
   last_modified_ts:Timestamp! @TreeIndex(index:"USER_INDEX")
}
```

{% hint style="success" %}
Don't hesitate to have a look at [schema update documentation](https://developer.mediarithmics.io/schema/defining-your-schema) to learn more about how to update your schema.
{% endhint %}

### Initial loading

You can ask your Account manager to run an initial loading on your datamart to calculate cohorts on existing userpoints.&#x20;
