A designated cohort is assigned to a user depending of the attributes & features (values of attributes) that you have defined to characterize your users. Once you have selected attributes, you need to format them in a JSON.
We recommend to :
Pick attributes that can be used to segment users and that are relevant to the business
Pick attributes from various typology (UserEvent, UserProfile, …)
Select between 3 & 10 attributes
Have a maximum of 100 features (from all various attributes)
Use 10 as Cohort Id Bit Size (corresponds to creating 1024 cohorts)
For instance, let's imagine that you want to create cohorts based on:
os_family -defined on UserAgentInfo nested in UserAgent
age - defined on UserProfile
city - defned on UserEvent
You will therefore define the following JSON:
There are 3 types of attributes available:
FREQUENCY_ENUM: use this type for a finite list of values like operating systems.
FREQUENCY_NUMBER: use this type for classifying number buckets like age.
FREQUENCY_TEXT: use this type an infinite (or long) liste of values like keywords, cities, ... Choose wisely the vector_size parameter as it will be used as a modulo on values to reduce the disparity of values to a fixed number
The field_path must contain the path of the attribute from the UserPoint definition (see schema documentation for more info)
A ML function requires a query to fetch data used in its configuration. In the case of the simhash ML function, it requires the appropriate query to fetch fields used as features and specified in the JSON.
Following our previous example, the graphQL query will be :