# OTQL queries

## Introduction

The Object Tree Query Language (OTQL) has been designed to help to search and calculate aggregates on a large collection of object trees. The object tree is defined[ in the schema](https://developer.mediarithmics.io/schema) using the [@TreeIndexRoot](https://developer.mediarithmics.io/schema#treeindexroot) and the [@TreeIndex ](https://developer.mediarithmics.io/schema#treeindex)directives.

{% hint style="success" %}
OTQL queries help you :

* Build segments
* Explore data
* Monitor data integration
* Check the volumetry
* Build data exports
* ...
  {% endhint %}

![](https://4196284719-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMuoqM-5hJ5JY0WnAKL%2F-MROi--T710L2X3tHQew%2F-MROjMB-0j3Dtp5MH1rD%2Fimage.png?alt=media\&token=9f8e45e0-c811-4b93-97b7-8f5ad323a05a)

An OTQL query looks like an SQL query.

```sql
SELECT { id name } FROM Product WHERE price > 50.0
```

It is composed of three parts:

* A `SELECT` **Operation**: It gives indications on what needs to be done: extracting field values or calculating aggregates
* A `FROM` **starting Object Type**: It defines the starting object type in the evaluation process
* A `WHERE` **Object Tree Expression:** It defines a logical expression mixing boolean operators and field operators to connect different objects in the object tree.

```sql
SELECT [Operation] FROM [Object Type] WHERE [Object Tree Expression]
```

There are two kinds of operations:

* **Selection Operations** are similar to a GraphQL operation and return a list of objects containing the required fields.
* **Aggregation Operations** return aggregated values (count, stats, histograms, ...) calculated on the selected objects.

![](https://4196284719-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MMuoqM-5hJ5JY0WnAKL%2F-MROi--T710L2X3tHQew%2F-MROk2B2t0scoP3lO8mz%2Fimage.png?alt=media\&token=8ea85c77-6505-4bf6-9dbf-71984fd344ff)

{% hint style="info" %}
The fields in the queries are completely related to [how you built your schema](https://developer.mediarithmics.io/schema).
{% endhint %}

Here are some examples of requests you can do with OTQL :

```sql
# Counts number of new users in the past 7 days
SELECT @count{} FROM UserPoint WHERE creation_date >= "now-7d/d"

# Counts the number of transactions on a specific site (channel) 7 days ago
SELECT @count{} FROM UserEvent 
WHERE name = "$transaction_confirmed" 
        AND date = "now-7d/d"
        AND channel_id = 2419

# Counts the number of profiles with female gender
SELECT @count{} FROM UserProfile WHERE gender = "W"

# Lists all categories from universes in events done on a specific channel
SELECT { universe { category @map }} FROM UserEvent WHERE channel_id = 2417

# Lists all event names collected in the platform
SELECT {name @map} FROM UserEvent

# Number of users having at least 3 events related to laptops in the past 15 days
SELECT @count{} FROM UserPoint 
WHERE activities { events @ScoreSum(min:3) {
        category = "Laptop" AND date >= "now-15d/d"
        }}

# Number of transactions per site and per day
SELECT { channel_id @map { date @date_histogram } } FROM UserEvent WHERE name = "$transaction_confirmed"

# Number of users having an account but no emails
SELECT @count{} FROM UserPoint where accounts{} and not emails{}
```

## FROM - Starting object type

Imagining the following Object Tree:

```undefined
UserPoint
├── UserActivity
│   └── UserEvent
├── UserEmail
└── UserAccount
```

You could build queries starting from all `UserPoint`, all `UserActivity`, `UserEvent`, `UserEmail` or `UserAccount`

```sql
# Selects all names from all UserPoint
SELECT {name} FROM UserPoint

# Selects all names from all UserActivity
SELECT {name} FROM UserActivity
# Equivalent of
SELECT {activities { name }} FROM UserPoint

# Selects all names from all UserEvent
SELECT {name} FROM UserEvent
# Equivalent of
SELECT { activities { events { name }}} FROM UserPoint
```

{% hint style="info" %}
You target object types with the FROM. In the example, the link field is activities, but we don't do SELECT (...) FROM activities.
{% endhint %}

## WHERE - Object tree expressions

The expression contained in the `WHERE` clause is composed of a list of predicates, separated by logical operators `AND`, `OR`, and `NOT` . Parenthesis can be used to group together two predicates with a logical operator.

Examples :

```sql
 SELECT (...) FROM (...) WHERE (PredicateA AND PredicateB) OR PredicateC
 SELECT (...) FROM (...) WHERE PredicateA AND (PredicateB OR PredicateC)
```

```sql
 SELECT (...) FROM (...) WHERE price > 50.0
```

```sql
  SELECT (...) FROM (...) WHERE price > 50.0 AND last_modified_date > "now-10d"
```

### Logical operators

Each predicate doesn't return directly a boolean but a score, 1 if the condition is respected else 0. At the end the score is compared to 0 and return true if it's higher than 0 else return false. These operator keep the same priority as boolean ones. The logical operators work as below :

| Logical operator              | Real operation                 | Priority |
| ----------------------------- | ------------------------------ | -------- |
| **NOT** PredicateA            | if ( ScoreA > 0) then 0 else 1 | high     |
| PredicateA **AND** PredicateB | ScoreA x ScoreB                | middle   |
| PredicateA **OR** PredicateB  | max(ScoreA, ScoreB)            | low      |

```sql
SELECT (...) FROM (...) 
WHERE price > 50.0 AND last_modified_date > "now-10d"
# ( price > 50.0 ) x ( last_modified_date > "now-10d" )

SELECT (...) FROM (...) 
WHERE price > 50.0 AND last_modified_date > "now-10d" OR  price > 100.0 AND last_modified_date > "now-20d"
# (( price > 50.0 ) x ( last_modified_date > "now-10d" )) + (( price > 100.0 ) x ( last_modified_date > "now-20d" ))
```

### Tree exploration

As we are querying an Object Tree, and as predicates are only possible on a **leaf** (e.g. fields that only contain a scalar value), it is natural to have a way of **going from the root to each of the leaves by traversing the tree.**

Braces symbols `{}` are used to traverse the tree through **link fields**. The sub-query written in the braces will be evaluated on each item in the linked list.

**Let's see it in action.**

Let's say we built a schema corresponding to the following Object Tree.

```graphql
# Functional tree
UserPoint
└─ UserActivity
   └─ UserEvent

# Associated schema
type UserPoint @TreeIndexRoot(index:"USER_INDEX") {
 # activities is a link field to UserActivity objects
 activities: [UserActivity]
}

type UserActivity {
 # events is a link field to UserEvent objects
 events: [UserEvent]
}

type UserEvent {
 name: String @TreeIndex(index_name: "USER_INDEX")
 amount: Int @TreeIndex(index_name: "USER_INDEX")
 date: Timestamp! 
}

# A UserPoint can have 0..n User Activity
# Each UserActivity can have 0..m UserEvent
# UserEvent has a "name" String field, an "amount" Int field and a "date" Timestamp field 
```

The following query will return all the `UserPoint` that have at least one `UserEvent` whose name is `$transaction_confirmed` .

As the root of the Object Tree is the `UserPoint` in this example, we'll need to start from there. And then follow the `activities` link to the associated `UserActivity` and then the `events` link to the associated `UserEvent`.

```sql
SELECT (...) FROM UserPoint WHERE activities { events { name = "$transaction_confirmed" } }
```

### Scoring operator

The latest query returns items with at least one of the events have a `$transaction_confirmed` name. We return every user that did at least one purchase and at least one visit.

If we instead want the users that bought things through at least 3 different visits (frequent buyers), we will use a **scoring operator**.

Each time there is a pair of braces `{ }` and a sub-query written in the braces, there is implicitly a score calculated for the sub-query. **By default, the score will be the number of items matching the sub-query**.

{% tabs %}
{% tab title="@ScoreSum" %}
Only returns score if the nested sub-query has a score superior or equal to min.

| args          | description                                                    |
| ------------- | -------------------------------------------------------------- |
| min : `Float` | minimum required score for the nested sub-query to be returned |

```sql
# Select UserPoint that bought things through at least 3 different visits
SELECT (...) FROM UserPoint
WHERE activities @ScoreSum(min: 3.0){ events { name = "$transaction_confirmed" } }

# Select UserPoint that have at least 1 activity that contains at least 3 $transaction_confirmed events
SELECT (...) FROM UserPoint
WHERE activities { events @ScoreSum(min: 3.0) { name = "$transaction_confirmed" } }
```

{% endtab %}

{% tab title="@ScoreAvg" %}
Calculate the average from the sub-query matching scores and returns true if it's superior or equal to min.

| args          | description                                                    |
| ------------- | -------------------------------------------------------------- |
| min : `Float` | minimum required score for the nested sub-query to be returned |

{% hint style="warning" %}
**By default, the score will be the number of items matching the sub-query**.  So using `@ScoreMax` or `@ScoreAvg` like that is useless because each score from the sub-query will be 1. It's why you should apply a modification on the score calculation.
{% endhint %}
{% endtab %}

{% tab title="@ScoreMax" %}
Takes the maximum score from the sub-query matching scores and returns true if it's superior or equal to min.

| args          | description                                                    |
| ------------- | -------------------------------------------------------------- |
| min : `Float` | minimum required score for the nested sub-query to be returned |

{% hint style="warning" %}
**By default, the score will be the number of items matching the sub-query**.  So using `@ScoreMax` or `@ScoreAvg` like that is useless because each score from the sub-query will be 1. It's why you should apply a modification on the score calculation.
{% endhint %}
{% endtab %}
{% endtabs %}

#### Changing the way scores are calculated

As said above, by default, score values are equal to the number of items matching a sub-query when following a link.

However, your Object Tree leaves have some number typed fields (`Int` or `Float`). It is possible to use those values as the score of a sub-query.

{% tabs %}
{% tab title="@ScoreField" %}
Select a specific field in which the numeric value used as the score is stored.

| args           | description                    |
| -------------- | ------------------------------ |
| name: `String` | The name of the field selected |

```sql
# Using @ScoreField alone is useless because it could be replace by logical operator
SELECT (...) FROM UserPoint
WHERE activity {
    events @ScoreField(name:"amount") {
        name = "$transaction_confirmed"
    }
}

# Can be written
SELECT (...) FROM UserPoint
WHERE activity {
    events {
        name = "$transaction_confirmed"
    }
}
```

&#x20;The information of which field is selected bubble up still it didn't catch by a `@ScoreSum`, `@ScoreAvg` or `@ScoreMax` .

```sql
# Select UserPoint having spent at 1000 in one event
SELECT (...) FROM UserPoint
WHERE activity { 
    events @ScoreField(name:"amount") @ScoreSum(min : 1000) { 
        name = "$transaction_confirmed" 
    } 
}

# Select UserPoint having spent at 1000 in one activity
SELECT (...) FROM UserPoint
WHERE activity @ScoreSum(min : 1000) { 
    events @ScoreField(name:"amount") { 
        name = "$transaction_confirmed" 
    } 
}
```

{% hint style="danger" %}
Be sure sub-field selected in @ScoreField exist in any field. Add a condition if the query return an error

```sql
SELECT (...) FROM UserPoint
WHERE activity @ScoreSum(min : 1000) {
    events @ScoreField(name:"amount") { 
        name = "$transaction_confirmed" 
    }
}

# To be sure field "amount" exist 
SELECT (...) FROM UserPoint
WHERE activity @ScoreSum(min : 1000) {
    events @ScoreField(name:"amount") { 
        is_defined(amount) AND 
        name = "$transaction_confirmed" 
    }
}
```

{% endhint %}
{% endtab %}

{% tab title="@ScoreBoost" %}
Multiply the score by the factor. Can be used to boost a sub-query over another one.&#x20;

| args            | description                             |
| --------------- | --------------------------------------- |
| factor: `Float` | Constant float which multiply the score |

```sql
# Select UserPoints having spent at least 1000 orders, where IT product count twice 
SELECT (...) FROM UserPoint
WHERE activity_events @ScoreSum(min : 1000) {
    order { 
      order_products @ScoreBoost(factor: 2.0) @ScoreSum(result: "score_value") {
            category="IT"
      },
      order_products @ScoreSum() {
            category!="IT"
        }
    }
}
```

{% endtab %}
{% endtabs %}

#### Using this calculated score

With the possibility to use score in a field, you may want to return the calculated score of a `@ScoreSum` and not only it a sub-query validate the condition or not. This is why we add a new parameters to `@ScoreSum` :`result`.&#x20;

{% tabs %}
{% tab title="@ScoreSum" %}
Only returns score if the nested sub-query has a score superior or equal to min.

| args              | description                                                                                                                                                                                                                                                                                              |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| min : `Float`     | minimum required score for the nested sub-query to be returned                                                                                                                                                                                                                                           |
| result : `String` | <p>Two values possibles : </p><ul><li><strong>"boolean\_value"</strong> (by default) : reduce the returned score value to 0 or 1. It has the same comportment as the previous one explain in conditional predicate.<br></li><li><strong>"score\_value"</strong> : return the real score value.</li></ul> |

```sql
# Select UserPoint having spent more than 1000€ through $transaction_confirmed events during the past year
SELECT (...) FROM UserPoint 
WHERE activities { 
    events @ScoreField(name: "amount") @ScoreSum(min: 1000) { 
        name = "$transaction_confirmed" AND date >= "now-1y/y" 
    } 
}

# Select UserPoint having spent more than 1000€ in cross orders with at least products which cost 10€: 
SELECT (...) FROM UserPoint
WHERE activity_events @ScoreSum(min : 1000) {
	order {
	    order_products @ScoreField(name: "amount") @ScoreSum(min: 10, result:"score_value") { 
          category="IT" 
      }
   }
}
```

{% endtab %}

{% tab title="@ScoreAvg" %}
Calculate the average from the sub-query matching scores and returns it if it's superior or equal to min args

| args              | description                                                                                                                                                                                                            |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| min : `Float`     | minimum required score for the nested sub-query to be returned                                                                                                                                                         |
| result : `String` | <p>Two values possibles : </p><ul><li><strong>"boolean\_value"</strong> (by default) : reduce the returned score value to 0 or 1.<br></li><li><strong>"score\_value"</strong> : return the real score value.</li></ul> |

```sql
# Select user points having spent on average at least 1000 through $transaction_confirmed events during the past year  
SELECT (...) FROM UserPoint 
WHERE activities { 
    events @ScoreField(name: "amount") @ScoreAvg(min: 1000) { 
        name = "$transaction_confirmed" AND date >= "now-1y/y" 
    } 
}

# Select user points having spent in average more than 1000€ by orders with at least products which cost 10€: 
SELECT (...) FROM UserPoint
WHERE activity_events @ScoreAvg(min : 1000) {
	order {
	    order_products @ScoreField(name: "amount") @ScoreSum(min: 10, result:"score_value") { 
          category="IT" 
      }
   }
}
```

{% endtab %}

{% tab title="@ScoreMax" %}
Takes the maximum score from the sub-query matching scores and returns it if it's superior or equal to min

| args              | description                                                                                                                                                                                                            |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| min : `Float`     | minimum required score for the nested sub-query to be returned                                                                                                                                                         |
| result : `String` | <p>Two values possibles : </p><ul><li><strong>"boolean\_value"</strong> (by default) : reduce the returned score value to 0 or 1.<br></li><li><strong>"score\_value"</strong> : return the real score value.</li></ul> |

```sql
# Select user points with at least 1 activity that contains at least 1 $transaction_confirmed event with amount >= 1000 during the past year  
SELECT (...) FROM UserPoint 
WHERE activities { 
    events @ScoreField(name: "amount") @ScoreMax(min: 1000) { 
        name = "$transaction_confirmed" AND date >= "now-1y/y" 
    } 
} 
```

{% endtab %}
{% endtabs %}

#### Go forward

It is possible to use these two ways at the same time but be careful, it is currently not possible to grow up a @ScoreField after a @ScoreSum(result: "boolean\_score").

Example of possible use case :&#x20;

```sql
# Select UserPoint having spent in average more than 1000 in one activity of events with a amount superior than 10
SELECT (...) FROM UserPoint
WHERE activity @ScoreAvg(min : 1000) { events @ScoreField(name:"amount") @ScoreSum(min 10, result:"score_value") { 
    name = "$transaction_confirmed" } }

# Select UserPoint having spent in average in one activity, more than 1000 events with a amount superior than 10 
SELECT (...) FROM UserPoint
WHERE activity @ScoreAvg(min : 1000) { events @ScoreField(name:"amount") @ScoreSum(min: 10, result:"boolean_value") { 
    name = "$transaction_confirmed" } }
```

However the following use case can't be written :

```sql
# We wanted : 
# Count UserPoint having spent more than 1000€ in cross orders with at least 10 orders more than 100€ : 
SELECT (...) FROM UserPoint
WHERE activity_events @ScoreSum(min : 1000) {
	order @ScoreSum(min: 10, result: "score_value") {
      @ScoreSum(min: 100, result: "boolean_value") {
           order_products @ScoreField(name: "price") @ScoreSum(result:"score_value") { 
              category="IT" 
           }
       }
   }
}
# But the query return :
# Count UserPoint having spent more than 1000 orders with at least 10 orders more than 100€
```

Using conditional and scoring ways in the same query is useful for many use case whose won't be detailed  here. A specific page has been created for regroup examples of them if you want to [go forward](https://developer.mediarithmics.io/otql-examples#scoring-operation).

### Date operators

The following operators are available to work with dates :

* `>=` Greater or equal
* `>` Greater
* `<=` Lower or equal
* `<` Lower
* `= or ==` Equal
* `!=` Not equal

Dates can be formatted either

* in **ISO8601 format** (time part is optional) `2012-09-27`, `2012-09-27T12:42:00`&#x20;
* in a **timestamp** in milliseconds `1549365498507`&#x20;
* in a **Date Math** format, defining a relative date

```sql
SELECT (...) FROM UserPoint WHERE activities {creation_ts <= "2012-09-27"}
SELECT (...) FROM UserPoint WHERE activities {creation_ts > "1549365498507"}
SELECT (...) FROM UserPoint WHERE activities {creation_ts > "now-7d"}
```

#### Date Math format

The idea of the date match syntax is to define **a relative date compared to an anchor date**. The anchor date is either `now` or a date (`ISO8601` or `timestamp` format) followed by `||`.

The expression begins with the anchor date and is followed by one or more math expressions :

* `+1h` adds one hour
* `-1d` substracts one day
* `/d` rounds down to the nearest day

Example, assuming now is `2001-01-01 12:00:00` :

```javascript
now+1h // Resolves to: 2001-01-01 13:00:00
now-1h // Resolves to: 2001-01-01 11:00:00
now-1h/d // Resolves to: 2001-01-01 00:00:00
2001.02.01||+1M/d // Resolves to: 2001-03-01 00:00:00
```

The supported units are the following :

| Date operator | description |
| ------------- | ----------- |
| y             | Years       |
| M             | Months      |
| w             | Weeks       |
| d             | Days        |
| h or H        | Hours       |
| m             | Minutes     |
| s             | Seconds     |

### String operators

Only the indexed fields of type String are eligible. Depending on the specified data\_type in the schema, the String operator will behave differently.

#### With `data_type: text` :

All operators are **case-insensitive.** [Values are stored as a set of words, transformed](https://developer.mediarithmics.io/schema#string-indexed-as-text)**.**

| String operator                                    | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| -------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| match(fieldName, comparisonValue, operator = "or") | <p>Returns true if words in the text contained in <code>fieldName</code> match words in <code>comparisonValue</code><br><code>operator</code> defines how words are matched. Defaults to <code>"or"</code>.</p><ul><li><code>"or"</code>: returns true if <strong>at least one word</strong> in <code>fieldName</code> matches a word in <code>comparisonValue</code></li><li><code>"and"</code>: returns true only if <strong>all words</strong> in <code>comparisonValue</code> are found in <code>fieldName</code></li></ul> |
| starts\_with(fieldName, comparisonValue)           | Returns true if a word of the text contained in fieldName starts with one of the words contained in comparisonValue.                                                                                                                                                                                                                                                                                                                                                                                                            |

The same transformation is done on the text data before storage is also done on the comparison value.

```sql
# Doing
(...) WHERE match(url_as_text, "Hello World!")

# Will search in the text values for words matching 'hello' or 'world'
https://www.hello.com/
https://www.world.com/
https://www.hello.com/world/
(...)
```

{% hint style="success" %}
Diacritical marks (e.g. é, è, à, ç), number/digits, and word/expression containing apostrophe are usually stored as-is which means that you will need to provide the same value in the match function.&#x20;
{% endhint %}

Below are some examples comparing the ingested and stored values, along with a demonstration of the **match** function: &#x20;

| Value ingested                                 | Values stored                            | Value matching                                                  | Value not matching                                  |
| ---------------------------------------------- | ---------------------------------------- | --------------------------------------------------------------- | --------------------------------------------------- |
| Métamorphosé                                   | métamorphosé                             | <p>Métamorphosé<br>métamorphosé</p>                             | <p>metamorphose<br>métamorphos<br>métamorphosés</p> |
| hameçon, ligne et bouchon                      | <p>hameçon<br>ligne<br>et<br>bouchon</p> | <p>hameçon<br>hameÇon<br>ligne<br>et<br>bouchon</p>             | hamecon                                             |
| aujourd'hui                                    | aujourd'hui                              | aujourd'hui                                                     | <p>aujourd hui<br>aujourd<br>hui</p>                |
| s'inscrire                                     | s'inscrire                               | s'inscrire                                                      | <p>s inscrire<br>inscrire<br>s</p>                  |
| 100                                            | 100                                      | 100                                                             | <p>1000<br>10</p>                                   |
| 100 900 km                                     | <p>100<br>900<br>km</p>                  | <p>100 900 km<br>100<br>900<br>km</p>                           | <p>100900<br>100,900</p>                            |
| km/h                                           | <p>km<br>h</p>                           | <p>km/h<br>km<br>h</p>                                          | <p>kmh<br></p>                                      |
| 100km/h                                        | <p>100km<br>h</p>                        | <p>100km/h<br>100km<br>h<br>km/h</p>                            | <p>100<br>km</p>                                    |
| H\&M                                           | <p>h<br>m</p>                            | <p>H\&M<br>h<br>m</p>                                           | hm                                                  |
| <p>£1,000<br>1,000<br>1,000+</p>               | 1,000                                    | <p>£1,000<br>1,000<br>1,000+</p>                                | <p>1000<br>000<br>1.000</p>                         |
| 1.000                                          | 1.000                                    | 1.000                                                           | <p>1000<br>000<br>1,000</p>                         |
| <p>chou-fleur<br>chou- fleur<br>chou fleur</p> | <p>chou<br>fleur</p>                     | <p>chou-fleur<br>chou- fleur<br>chou fleur<br>chou<br>fleur</p> | <p>chou\_fleur<br>choufleur</p>                     |
| chou\_fleur                                    | chou\_fleur                              | chou\_fleur                                                     | <p>chou<br>fleur<br>choufleur<br>chou-fleur</p>     |
| recherche.aspx                                 | recherche.aspx                           | recherche.aspx                                                  | <p>recette<br>aspx</p>                              |
| <test@mics.com>                                | <p>test<br>mics.com</p>                  | <p><test@mics.com><br>test<br>mics.com</p>                      | <p>@<br>mics</p>                                    |

#### With `data_type: keyword` :

All operators are **case-sensitive**. [Values are considered as a single word. No transformation is made](https://developer.mediarithmics.io/schema#string-indexed-as-keyword).

| String operator                          | description                                                                                                             |
| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| starts\_with(fieldName, comparisonValue) | Returns true if the **exact** value contained in fieldName starts with the **exact** value passed in comparisonValue.   |
| = or ==                                  | Returns true if the **exact** value contained in fieldName is equal with the **exact** value passed in comparisonValue. |

```sql
(...) WHERE starts_with(mykeyword, "Hello World!")
(...) WHERE mykeyword == "Hello World!"
```

### In operator

You can use the `IN` operator as a shortcut to filter on multiple values of the same field.

```sql
# Total sold in events for channel IDs 2456, 5489, 1426
SELECT {events {basket { amount @sum}}} FROM UserActivity 
WHERE channel_id IN ["2456","5489","1426"]

# Equivalent of WHERE channel_id = "2456" OR channel_id = "5489" OR channel_id = "1426"
```

{% hint style="info" %}
This operator offers better performances than multiple `OR` operators.
{% endhint %}

### Is\_defined operator

This is used to evaluate the value of a field and check if it is defined or not. The predicate can be applied in any indexed field in the schema and return a boolean.

```sql
# Return all UserPoint with a profile 
SELECT { id } FROM UserPoint WHERE is_defined(profiles) 

# Return all UserPoint with an email in their profile 
SELECT { id } FROM UserPoint WHERE profiles{is_defined(email)}
```

| Field Value            | Return |
| ---------------------- | ------ |
| myField = “example”    | True   |
| myField = \[“example”] | True   |
| myField = \[null]      | True   |
| myField = ""           | True   |
| myField = \[ ]         | True   |
| myField = null         | False  |
| myField = undefined    | False  |
| (NoField)              | False  |

\*\*\*\*

## JOIN operations

You may want to add another list of predicates FROM various objects. To do so, use JOIN clause to mention another object right after the FROM/WHERE clauses. It's possible to apply multiple JOIN in the same query. &#x20;

```sql
# Get the activity “$transaction_confirmed” of UserPoint of the segment id “1234“
SELECT { id }
FROM ActivityEvent WHERE name=="$transaction_confirmed"
JOIN Userpoint WHERE segments { id="1234" }
```

## LIMIT operations

```sql
# Get only 5 or fewer activities named “$transaction_confirmed”
SELECT { id }
FROM ActivityEvent WHERE name=="$transaction_confirmed"
LIMIT 5
```

In the Query Tools, we return 10 elements by default but you can easily override this by using the LIMIT clause followed by the number of elements required:&#x20;

```sql
SELECT { id }
FROM ActivityEvent WHERE name=="$transaction_confirmed"
LIMIT 100 
# This query returns 100 (or fewer) activities named “$transaction_confirmed”
```

{% hint style="danger" %}
It's not possible to return more than 10 000 elements with `LIMIT`. The query will fail if you try it.&#x20;
{% endhint %}

Note that the LIMIT clause will be ignored when using `@count` or `@cardinality` directives.

```sql
SELECT @count{ }
FROM ActivityEvent WHERE name=="$transaction_confirmed"
LIMIT 5
# Return the count of all activities named "$transaction_confirmed"
# Example: Return - 21,866,076
```

{% hint style="danger" %}
`LIMIT` operator isn't applied during the segment calculation.

If you create a User Query Segment with a `LIMIT`, it will be ignored and return all  UserPoint who respect the `WHERE` clause.
{% endhint %}

{% hint style="info" %}
If you're using an aggregation directive such as @map and you want to extend the number of buckets returned, you need to use the limit parameter of the aggregation directive (for instance: `@map(limit:200)`) and not the `LIMIT` clause. More details available hereunder.
{% endhint %}

## SELECT operations

They are simply selecting fields. Every field present [in the schema](https://developer.mediarithmics.io/schema) can be selected.

```sql
# Select id and name in the root level
SELECT { id name } FROM UserPoint

# Select name in UserPoint and creation_ts and id in emails linked to the UserPoint
SELECT { name emails { creation_ts id } } FROM UserPoint
```

### Filters

[As we saw earler in this documentation](#where-object-tree-expressions), the `WHERE` expression gives you the ability to filter a sublist of objects at `FROM` level. You also have the capability to filter **in** or **out** the data returned by the query using the `@filter` directive in `SELECT` :

```
{<OBJECT> @filter(clause: "<FIELD_NAME> == \"<FIELD_VALUE>\"", 
filter_empty: <BOOLEAN>) {<FIELD_NAME_1> <FIELD_NAME_2>}
```

<table><thead><tr><th width="180">Option</th><th width="133">Mandatory</th><th>Usage</th></tr></thead><tbody><tr><td><code>clause</code></td><td>Yes</td><td>Used to list fields names &#x26; values that you want to filter in/out</td></tr><tr><td><code>filter_empty</code></td><td>No</td><td>Used to filter out empty object. Set to <code>true</code> by default.</td></tr></tbody></table>

#### Tips

Here are some tips to properly use the `@filter` directive in your queries:&#x20;

* Filter multiple fields (note that you can used OR or AND between fields, based on the required filter logic):

```
@filter(clause: "category == \"CAT_1\" OR referrer == \"REF\"")
```

* Filter **in** multiple values for a given fields:

```
@filter(clause: "category == \"CAT_1\" OR category == \"CAT_2\"")
```

* Filter **out** multiple values for a given fields:

```
@filter(clause: "category != \"CAT_1\" AND category != \"CAT_2\"")
```

* Combine AND & OR filters:

```
@filter(clause: "(category == \"CAT_1\" OR category == \"CAT_2\") AND 
referrer == \"REF\"")
```

* Filter by a subfield:&#x20;

```
@filter(clause: "events { category == \"CAT_1\" } ")
```

When `filter_empty:true` option is provided, the following elements will be **filtered out**:

* An optional array which is empty
* An object which is empty and either optional or inside an array
* A mandatory array where the following conditions are met :&#x20;
  * Tha array is empty
  * The other mandatory selections of the parent object are only empty arrays
  * This parent object can be filtered

{% hint style="warning" %}
Please note that the `@filter` directive cannot be used at the same time as aggregation operations such as @map, @date\_histogram, ...
{% endhint %}

#### Example 1  - Usage of clause

The following query retrieves userpoints, activities, events and some of their field for each UserPoint that has an event named "display".

```
select { id { activities { id events { name score } }  } }
from UserPoint
where { activities { events { name  == "display" } }  }
```

Let's assume it gives the following result :

```
[
  {
    "id": "up1",
    "activities": [
      {
        "id": "a1",
        "events": [
          { "name": "display",  "score": 123 },
          { "name": "click"}
        ]
      },
      {
        "id": "a2",
        "events": [ { "name": "display" } ]
      }
    ]
  },
  {
    "id": "up2",
    "activities": [
      {
        "id": "a3",
        "events": [ { "name": "click" } ]
      },
      {
        "id": "a4",
        "events": [ { "name": "display" } ]
      }
    ]
  }
]
```

The user might be surprised to find "click" events in this result. However remember that the where clause only filter the roots (i.e the UserPoints). To retrieve only "display" events, the user will need to add an @filter clause as follow:

```
select { id { activities { id events @filter(clause: "name == \"display\"") { name score } }  } }
from UserPoint
where { activities { events { name  == "display" } }  }
```

&#x20;Assuming the same data, this query would produce the following result :&#x20;

```
  [
   {
    "id": "up1",
    "activities": [
      {
        "id": "a1",
        "events": [
          { "name": "display",  "score": 123 }
        ]
      },
      {
        "id": "a2",
        "events": [ { "name": "display" } ]
      }
    ]
  },
  {
    "id": "up2",
    "activities": [
      {
        "id": "a3",
        "events": []
      },
      {
        "id": "a4",
        "events": [ { "name": "display" } ]
      }
    ]
  }
]
```

#### Example 2 - Usage of filter\_empty&#x20;

The `@filter` predicate also filters by default empty result in its scope. To illustrate this, we reduce our query to retrieve only the score fields :&#x20;

```
select { { activities { events @filter(clause: "name == \"display\"") { score } }  } }
from UserPoint
where { activities { events { name  == "display" } }  }
```

Still the same data, the result is the following:

```
[
  {
    "activities": [
      { "events": [ { "score": 123 } ] },
      { "events": [ ] }
    ]
  },
  {
    "activities": [
      { "events": [] },
      { "events": [] }
    ]
  }
]
```

You'll notice that two `display` events have disappeared. Since they don't have a score, they would be empty object. Actually, this "filter empty" behavior can be set using a second optional parameter to the `@filter` directive. If we set it to false, we will obtain the following result :&#x20;

```
select { { activities { 
             events @filter(clause: "name == \"display\"", filter_empty: false) { score } }  } }
from UserPoint
where { activities { events { name  == "display" } }}

//result 

[
  {
    "activities": [
      { "events": [ { "score": 123 } ] },
      { "events": [ {} ] }
    ]
  },
  {
    "activities": [
      { "events": [] },
      { "events": [ {} ] }
    ]
  }
]
```

#### Example 3 - Usage of extra @filter

One might think that the previous result still contains a lot of noise (4 events retrieved for only one score). You can add an extra `@filter` before the object name to lighten the result:

```
select @filter { { activities { 
             events @filter(clause: "name == \"display\"") { score } }  } }
from UserPoint
where { activities { events { name  == "display" } }}
```

You will get the following result :&#x20;

```
[
  {
    "activities": [ { "events": [ { "score": 123 } ] }   ]
  }
]
```

#### Example 3 - Usage of @filter sub-field

@filter can be used to filter a field by a condition on a sub-field.&#x20;

```
SELECT { activities @filter(clause: "events {is_defined(event_name) AND event_name == \"display\"}") 
    { events { event_name } } }
FROM UserPoint
where { activities { events { name  == "display" }}}
```

### Aggregation Operations

The aggregation operations are initiated by a directive in the `SELECT` clause. They take into account the filter defined in the `WHERE` clause, however they are not compatible with the `@filter` directive that you can use in the `SELECT` clause.

#### @count

This directive is used to count the number of objects verifying the query

```sql
# Counts the number of UserPoint
SELECT @count {} FROM UserPoint

# Counts number of new users in the past 7 days
SELECT @count{} FROM UserPoint WHERE creation_date >= "now-7d/d"
```

#### Metrics directives

{% hint style="warning" %}
Fields used with metrics directives should have the [@TreeIndex](https://developer.mediarithmics.io/schema#treeindex) directive in your schema.
{% endhint %}

Those directives calculate a value per bucket created in the bucket directive, or with only one bucket containing all elements if you don't use bucket directives.

* `@avg`: average value for a specific field (only applies to numeric values)

```sql
# Average basket amount between two specific dates
SELECT {basket {amount @avg}} FROM UserEvent 
WHERE {date >= "2020-12-01" AND date <= "2020-12-31" }
```

* `@min`: minimum value for a specific field (only applies to numeric values)

```sql
# Minimum basket amount between two specific dates
SELECT {basket {amount @min}} FROM UserEvent 
WHERE { date >= "2020-06-20" AND date <= "2020-06-25” }
```

* `@max`: maximum value for a specific field (only applies to numeric values)&#x20;

```sql
# Maximum basket amount between two specific dates
SELECT {basket {amount @max}} FROM UserEvent 
WHERE { date >= "2020-06-20" AND date <= "2020-06-25”}
```

* `@sum`: sum of value for a specific field (only applies to numeric values)

```sql
# Sum of basket amounts between two specific dates
SELECT {order{amount @sum }} FROM ActivityEvent 
WHERE {date >= "2020-06-20" AND date <= "2020-06-25”}
```

* `@cardinality`: count of distinct values

```sql
# Number of channels in a datamart
SELECT {channel_id @cardinality} FROM ActivityEvent

# Number of cookies associated with UserPoint in a specific segment
SELECT  {agents{id @cardinality}} FROM UserPoint 
WHERE segments {id=”XXXX”}
```

#### Bucket directives

{% hint style="warning" %}
Fields used with bucket directives should have the [@TreeIndex](https://developer.mediarithmics.io/schema#treeindex) directive in your schema.
{% endhint %}

Those directives separate values into buckets

* `@map` one bucket per field value

```sql
SELECT { channel_id @map  {     # map the values of channel id in several buckets 
            session_duration @avg       # The average duration        
        }
} FROM UserActivity

# Data
# channel ID : 1234, count : 654987987987, session_duration: 100
# channel ID : 1235, count : 987987965465, session_duration: 1500
```

{% hint style="info" %}
@map does not count null values
{% endhint %}

{% hint style="info" %}
You can use the `limit` function to extend the number or elements that can be returned by the directive (default: 50, maximum: 50 000) :

`SELECT { name @map(limit:200) } FROM UserEvent`
{% endhint %}

* `@histogram` aggregated count on a specific field. The interval can be modified regarding the business needs

```sql
SELECT { order { amount @histogram(interval:50)}}
FROM UserEvent WHERE date >= "now-7d"

# Data
# Key: 0, count: 97681
# Key: 50, count: 50324
# Key: 100, count: 33164
# Key: 150, count: 36528
```

* `@date_histogram` aggregated count by a period of an object associated with a date. Allowed intervals are `1M` for a month and `XXD` for a XX number of days.

```sql
# Mere use of @date_histogram directive: selecting all page_view events 
# in the last 30 days 
SELECT { date @date_histogram(interval:"1d") } 
FROM UserEvent
WHERE name = "page_view" and date >= "now-30d/d"

# @date_histogram used together with @map directive with default interval (days)
SELECT { channel_id @map {date @date_histogram }} 
FROM UserEvent
WHERE name = "$transaction_confirmed"

# Data
# Key: 2416, count: 27563351
#             2018-01-16T00:00:00.000Z 330
#             2018-01-17T00:00:00.000Z 331
#             2018-01-18T00:00:00.000Z 3332
#             ...
# Key: 2417, count: 65498798
# ...

# Force an interval of one month
SELECT { channel_id @map {date @date_histogram(interval: "1M") } }
FROM UserEvent
WHERE name = "$transaction_confirmed"
```

{% hint style="info" %}
Please note that you cannot use bucket directives with metrics.
{% endhint %}

#### Aliases

It is possible to add an alias to the field expression. This alias is then used in the output to identify the field result.

```sql
SELECT {
        numberOfChannels: channel_id @cardinality # The approximate number of distinc values 
        averageDuration: duration @avg            # The average duration
        mininumDuration: duration @min            # map the values of channel id in several buckets 
} FROM UserEvent
```

## Managing queries

You usually enter OTQL queries directly in tools like the navigator. However, **they can be saved and managed by code as objects.** Some features will require you to link an object to an OTQL query, instead of just saving the query as text.

{% hint style="info" %}
Features asking you to save queries to reference them usually want to leverage this for performances optimisation.
{% endhint %}

## Creating a query

<mark style="color:green;">`POST`</mark> `https://api.mediarithmics.com/v1/datamarts/:datamartId/queries`

Create an OTQL query in the platform before creating an export based on this query

#### Path Parameters

| Name       | Type    | Description            |
| ---------- | ------- | ---------------------- |
| datamartId | integer | The ID of the datamart |

#### Request Body

| Name | Type   | Description |
| ---- | ------ | ----------- |
| Body | object | Payload     |

{% tabs %}
{% tab title="200 " %}

```javascript
{
  "status": "ok",
  "data": 
    {
      "id": "50409", // ID of the query to retrieve for the next steps
      "datamart_id": "1509",
      "query_language": "OTQL",
      "minor_version": null,
      "major_version": null,
      "query_text": "SELECT {id} FROM UserPoint",
      "favorite": false
    }
}
```

{% endtab %}

{% tab title="400 If the query is invalid" %}

```javascript
{
    "status": "error",
    "error": "cannot save invalid query, cause: Syntax error while parsing document \"nawak\". Invalid input 'n', expected Comments or select (line 1, column 1):\nnawak\n^",
    "error_code": "BAD_REQUEST_DATA",
    "error_id": "ef292c4e-1eab-4a7f-8fb2-77d797139be9"
}
```

{% endtab %}
{% endtabs %}

```javascript
// Creating a query payload
{
    "query_text": "SELECT {id} FROM UserPoint", // Your query
    "datamart_id": "<ASSOCIATED_DATAMART_ID>",
    "query_language": "OTQL"
}
```

## Check a query

<mark style="color:green;">`POST`</mark> `https://api.mediarithmics.com/v1/datamarts/:datamartId/query_check/otql`

Create an OTQL query in the platform before creating an export based on this query

#### Path Parameters

| Name       | Type    | Description            |
| ---------- | ------- | ---------------------- |
| datamartId | integer | The ID of the datamart |

#### Request Body

| Name | Type   | Description |
| ---- | ------ | ----------- |
| Body | object | Payload     |

{% tabs %}
{% tab title="200 " %}

```javascript
// If valid
{
    "status": "ok",
    "data": {
        "type": "VALID",
        "validation": {
            "parameters": []
        },
        "status": "ok"
    }
}

// If invalid
{
    "status": "ok",
    "data": {
        "type": "PARSING_ERROR",
        "error": {
            "message": "Syntax error while parsing document \"nawak\". Invalid input 'n', expected Comments or select (line 1, column 1):\nnawak\n^",
            "position": {
                "row": 1,
                "col": 1
            },
            "error_type": "PARSING"
        },
        "status": "error"
    }
```

{% endtab %}
{% endtabs %}

```javascript
// Checking a query payload
{
    "query": "SELECT {id} FROM UserPoint" // Your query
}
```

## Executing queries

You can execute queries in the different tools that mediarithmics offer, or using our API.&#x20;

## Execute a query

<mark style="color:green;">`POST`</mark> `https://api.mediarithmics.com/v1/datamarts/:datamart_id/query_executions/otql?use_cache=true`

Executes an OTQL query on the specified datamart

#### Path Parameters

| Name         | Type    | Description            |
| ------------ | ------- | ---------------------- |
| datamart\_id | integer | The ID of the datamart |

#### Query Parameters

| Name       | Type    | Description                                 |
| ---------- | ------- | ------------------------------------------- |
| use\_cache | boolean | Optimize the response time using the cache. |

#### Request Body

| Name | Type   | Description           |
| ---- | ------ | --------------------- |
| Body | string | OTQL query to execute |

{% tabs %}
{% tab title="200 " %}

```javascript
{
    "status": "ok",
    "data": {
        "took": 112015,
        "timed_out": false,
        "offset": null,
        "limit": null,
        "result_type": "COUNT",
        "precision": "FULL_PRECISION",
        "sampling_ratio": null,
        "rows": [
            {
                "count": 80975924
            }
        ],
        "cache_hit": true
    }
}
```

{% endtab %}

{% tab title="503 Retry later." %}

```javascript
{
    "status": "error",
    "error": "Service Unavailable",
    "error_code": "SERVICE_UNAVAILABLE",
    "error_id": "482416e1-2d93-484a-948b-615b639b5e4f"
}
```

{% endtab %}
{% endtabs %}

### Query cache

When setting the `use_cache` query parameter to `TRUE`, the system returns the query from the cache if available.&#x20;

To know if the returned value is from the cache or a new query execution, look at the `cache_hit` property from the response. Its value is `TRUE` if the response comes from the cache and `FALSE` otherwise. &#x20;

{% hint style="info" %}
The cache expiration delay is **12 hour**.
{% endhint %}

When not setting the `use_cache` query parameter or setting its value to `FALSE`, the cache system is skipped. The query will be executed and its value won't be stored. You can't use this to force a cache update .

Running the query `SELECT ... FROM ... WHERE ts >= "now-1h"` (with a *Date Math* format from [Date operators](#date-operators)) will return the same result now, in five minutes and during the next hour if using the cache.

## Queries optimization

Our engine tries to automatically optimize queries before running them. For example, a query with multiple `OR` operators can use the `IN` operator instead if it is better.&#x20;
