# Cohort-based Lookalike

{% hint style="info" %}
Please refer to our [user guide documentation](https://userguides.mediarithmics.io/audience/segments/segment-typology/user-lookalike-segment/cohort-based-lookalike) to learn more about this feature.
{% endhint %}

{% hint style="warning" %}
To configure this feature, please follow the next steps **IN ORDER**:

1. Attributes definition
2. ML function creation
3. ML function activation
4. Schema update
5. ML function initial loading
   {% endhint %}

## Attributes definition

### Attributes

#### Attributes selection

A designated cohort is assigned to a user depending on attributes (also named features in DataScience) you have defined to characterize your users. You will need to format those attributes using JSON format (see below).

{% hint style="info" %}
We recommend to :

* Pick attributes that can be used to segment users and that are relevant to the business
* Pick attributes that are available on all your users (logged / unlogged)
* Pick attributes from various typology (UserEvent, UserProfile, …)
* Select between 3 & 10 attributes
* Have between 50 & 300 values of attributes (from all various attributes)
* Keep the default of 1024 cohorts (Cohort Id Bit Size = 10, see below for more information about this)
  {% endhint %}

#### JSON format

For instance, let's imagine that you want to create cohorts based on:

* **os\_family** - defined on **UserAgentInfo** nested in **UserAgent**
* **age** - defined on **UserProfile**
* **city** - defned on **UserEvent**

You will therefore define the following JSON:

```json
[
  {
    "type": "FREQUENCY_ENUM",
    "field_path": "agents.user_agent_info.os_family",
    "values": [
      "OTHER",
      "WINDOWS",
      "MAC_OS",
      "LINUX",
      "ANDROID",
      "IOS"
    ]
  },
  {
    "type": "FREQUENCY_NUMBER",
    "field_path": "profiles.age",
    "intervals": [
      {
        "from": 0,
        "to": 10
      },
      {
        "from": 10,
        "to": 100
      }
    ]
  },
  {
    "type": "FREQUENCY_TEXT",
    "field_path": "events.city",
    "vector_size": 100
  }
]
```

#### Configuration help

There are 3 types of attributes available:

* **FREQUENCY\_ENUM**: use this type for a finite list of values like operating systems.
* **FREQUENCY\_NUMBER**: use this type for classifying number buckets like age. Using the above example:
  * First bucket: >= 0 & < 10&#x20;
  * Second bucket: >= 10 & < 100
  * Third bucket: anything that didn't fell into the 2 defined buckets
* **FREQUENCY\_TEXT**: use this type an infinite (or long) liste of values like keywords, cities, ... Choose wisely the **vector\_size** parameter as it will be used as a modulo on values to reduce the disparity of values to a fixed number

The **field\_path** must contain the path of the attribute from the UserPoint definition (see [schema documentation](https://developer.mediarithmics.io/schema) for more info)

### **GraphQL Query**

A ML function requires a query to fetch data used in its configuration. In the case of cohort-based lookalike, it requires an appropriate query to fetch fields used as attributes and specified in the JSON.

Following our previous example, the graphQL query will be :&#x20;

```graphql
{agents {user_agent_info {os_family}} profiles {age} events{city}}
```

## ML function instantiation

Please follow the next steps to instantiate the ML function developed by mediarithmics to assign a cohort to your userpoints:

1. Head to Settings > Datamart > ML Functions
2. Click on **New Ml Function**, pick the datamart where to apply the ML function then choose **simhash-cohorts-calulation**
3. Enter the following information on the ML function configuration panel:
   * General Informations
     * Name: **Cohort ML Function**
     * Hosting Object Type: **UserPoint**
     * Field Type Name: **ClusteringCohort**
     * Field Name: **clustering\_cohort**
     * Query: *\<Insert here the graphQL query that need to be run to extract attributes used to calculate your cohort>*
   * Properties
     * Features: *\<Insert here the one-line JSON>*
     * Cohort Id Bit Size: *\<Wil be used to define number of cohorts in your datamart as 2^(Cohort Id Bit Size)>*
4. Click on **Save** button

{% hint style="warning" %}
Note that only one Cohort-based Lookalike model can be set up at a time in an organisation.
{% endhint %}

## ML function&#x20;

### Activation

Once the ML function has been instantiated, you will need to update **batch\_mode** parameter to **true** and **activate the ML function** by running the following API :&#x20;

```json
PUT https://api.mediarithmics.com/v1/ml_functions/<id_ml_function>
 {
  "batch_mode": true,
  "status": "ACTIVE"
 }
```

## Schema update

Two changes have to be made in your runtime schema :

* Add a field **clustering\_cohort** in **UserPoint** as follow :

```graphql
type UserPoint  @TreeIndexRoot(index:"USER_INDEX") {
   ...
   clustering_cohort:ClusteringCohort
   ...
}
```

* Create a new **ClusertingCohort** type as follow :

```graphql
type ClusteringCohort  {
   id:ID! @TreeIndex(index:"USER_INDEX")
   expiration_ts:Timestamp @TreeIndex(index:"USER_INDEX")
   cohort_id:String! @TreeIndex(index:"USER_INDEX")
   last_modified_ts:Timestamp! @TreeIndex(index:"USER_INDEX")
}
```

{% hint style="success" %}
Don't hesitate to have a look at [schema update documentation](https://developer.mediarithmics.io/schema/defining-your-schema) to learn more about how to update your schema.
{% endhint %}

### Initial loading

You can ask your Account manager to run an initial loading on your datamart to calculate cohorts on existing userpoints.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developer.mediarithmics.io/advanced-usages/audiences/cohort-based-lookalike.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
