OTQL queries
This page provides a formal description of OTQL capabilities.
Introduction
The Object Tree Query Language (OTQL) has been designed to help to search and calculate aggregates on a large collection of object trees. The object tree is defined in the schema using the @TreeIndexRoot and the @TreeIndex directives.
OTQL queries help you :
Build segments
Explore data
Monitor data integration
Check the volumetry
Build data exports
...
An OTQL query looks like an SQL query.
It is composed of three parts:
A
SELECT
Operation: It gives indications on what needs to be done: extracting field values or calculating aggregatesA
FROM
starting Object Type: It defines the starting object type in the evaluation processA
WHERE
Object Tree Expression: It defines a logical expression mixing boolean operators and field operators to connect different objects in the object tree.
There are two kinds of operations:
Selection Operations are similar to a GraphQL operation and return a list of objects containing the required fields.
Aggregation Operations return aggregated values (count, stats, histograms, ...) calculated on the selected objects.
The fields in the queries are completely related to how you built your schema.
Here are some examples of requests you can do with OTQL :
FROM - Starting object type
Imagining the following Object Tree:
UserPoint
---- UserActivity
-------- UserEvent
---- UserEmail
---- UserAccount
You could build queries starting from all UserPoint
, all UserActivity
, UserEvent
, UserEmail
or UserAccount
You target object types with the FROM. In the example, the link field is activities, but we don't do SELECT (...) FROM activities.
WHERE - Object tree expressions
The expression contained in the WHERE
clause is composed of a list of predicates, separated by logical operators AND
, OR
, and NOT
. Parenthesis can be used to group together two predicates with a logical operator.
Examples :
Logical operators
Each predicate doesn't return directly a boolean but a score, 1 if the condition is respected else 0. At the end the score is compared to 0 and return true if it's higher than 0 else return false. These operator keep the same priority as boolean ones. The logical operators work as below :
Logical operator | Real operation | Priority |
NOT PredicateA | if ( ScoreA > 0) then 0 else 1 | high |
PredicateA AND PredicateB | ScoreA x ScoreB | middle |
PredicateA OR PredicateB | max(ScoreA, ScoreB) | low |
Tree exploration
As we are querying an Object Tree, and as predicates are only possible on a leaf (e.g. fields that only contain a scalar value), it is natural to have a way of going from the root to each of the leaves by traversing the tree.
Braces symbols {}
are used to traverse the tree through link fields. The sub-query written in the braces will be evaluated on each item in the linked list.
Let's see it in action.
Let's say we built a schema corresponding to the following Object Tree.
The following query will return all the UserPoint
that have at least one UserEvent
whose name is $transaction_confirmed
.
As the root of the Object Tree is the UserPoint
in this example, we'll need to start from there. And then follow the activities
link to the associated UserActivity
and then the events
link to the associated UserEvent
.
Scoring operator
The latest query returns items with at least one of the events have a $transaction_confirmed
name. We return every user that did at least one purchase and at least one visit.
If we instead want the users that bought things through at least 3 different visits (frequent buyers), we will use a scoring operator.
Each time there is a pair of braces { }
and a sub-query written in the braces, there is implicitly a score calculated for the sub-query. By default, the score will be the number of items matching the sub-query.
Only returns score if the nested sub-query has a score superior or equal to min.
args | description |
min : | minimum required score for the nested sub-query to be returned |
Changing the way scores are calculated
As said above, by default, score values are equal to the number of items matching a sub-query when following a link.
However, your Object Tree leaves have some number typed fields (Int
or Float
). It is possible to use those values as the score of a sub-query.
Select a specific field in which the numeric value used as the score is stored.
args | description |
name: | The name of the field selected |
The information of which field is selected bubble up still it didn't catch by a @ScoreSum
, @ScoreAvg
or @ScoreMax
.
Be sure sub-field selected in @ScoreField exist in any field. Add a condition if the query return an error
Using this calculated score
With the possibility to use score in a field, you may want to return the calculated score of a @ScoreSum
and not only it a sub-query validate the condition or not. This is why we add a new parameters to @ScoreSum
:result
.
Only returns score if the nested sub-query has a score superior or equal to min.
args | description |
min : | minimum required score for the nested sub-query to be returned |
result : | Two values possibles :
|
Go forward
It is possible to use these two ways at the same time but be careful, it is currently not possible to grow up a @ScoreField after a @ScoreSum(result: "boolean_score").
Example of possible use case :
However the following use case can't be written :
Using conditional and scoring ways in the same query is useful for many use case whose won't be detailed here. A specific page has been created for regroup examples of them if you want to go forward.
Date operators
The following operators are available to work with dates :
>=
Greater or equal>
Greater<=
Lower or equal<
Lower= or ==
Equal!=
Not equal
Dates can be formatted either
in ISO8601 format (time part is optional)
2012-09-27
,2012-09-27T12:42:00
in a timestamp in milliseconds
1549365498507
in a Date Math format, defining a relative date
Date Math format
The idea of the date match syntax is to define a relative date compared to an anchor date. The anchor date is either now
or a date (ISO8601
or timestamp
format) followed by ||
.
The expression begins with the anchor date and is followed by one or more math expressions :
+1h
adds one hour-1d
substracts one day/d
rounds down to the nearest day
Example, assuming now is 2001-01-01 12:00:00
:
The supported units are the following :
Date operator | description |
y | Years |
M | Months |
w | Weeks |
d | Days |
h or H | Hours |
m | Minutes |
s | Seconds |
String operators
Only the indexed fields of type String are eligible. Depending on the specified data_type in the schema, the String operator will behave differently.
With data_type: text
:
data_type: text
:All operators are case-insensitive. Values are stored as a set of words, transformed.
String operator | description |
match(fieldName, comparisonValue) | Returns true if a word of the text contained in fieldName matches a word contained in comparisonValue. |
starts_with(fieldName, comparisonValue) | Returns true if a word of the text contained in fieldName starts with one of the words contained in comparisonValue. |
The same transformation is done on the text data before storage is also done on the comparisonValue.
With data_type: keyword
:
data_type: keyword
:All operators are case-sensitive. Values are considered as a single word. No transformation is made.
String operator | description |
starts_with(fieldName, comparisonValue) | Returns true if the exact value contained in fieldName starts with the exact value passed in comparisonValue. |
= or == | Returns true if the exact value contained in fieldName is equal with the exact value passed in comparisonValue. |
In operator
You can use the IN
operator as a shortcut to filter on multiple values of the same field.
This operator offers better performances than multiple OR
operators.
Is_defined operator
This is used to evaluate the value of a field and check if it is defined or not. The predicate can be applied in any indexed field in the schema and return a boolean.
Field Value | Return |
myField = “example” | True |
myField = [“example”] | True |
myField = [null] | True |
myField = "" | True |
myField = [ ] | True |
myField = null | False |
myField = undefined | False |
(NoField) | False |
****
JOIN operations
You may want to add another list of predicates FROM various objects. To do so, use JOIN clause to mention another object right after the FROM/WHERE clauses. It's possible to apply multiple JOIN in the same query.
LIMIT operations
In the Query Tools, we return 10 elements by default but you can easily override this by using the LIMIT clause followed by the number of elements required:
It's not possible to return more than 10 000 elements with LIMIT
. The query will fail if you try it.
Note that the LIMIT clause will be ignored when using @count
or @cardinality
directives.
LIMIT
operator isn't applied during the segment calculation.
If you create a User Query Segment with a LIMIT
, it will be ignored and return all user points who respect the WHERE
clause.
If you're using an aggregation directive such as @map and you want to extend the number of buckets returned, you need to use the limit parameter of the aggregation directive (for instance: @map(limit:200)
) and not the LIMIT
clause. More details available hereunder.
SELECT operations
They are simply selecting fields. Every field present in the schema can be selected.
Filters
The WHERE
expression let you create a filter to select a sublist of objects of the FROM
clause. You can also choose to filter the selected data on those objects using the @filter
directive
The following query retrieves userpoints, activities, events and some of their field for each user point that has an event named "display".
Let's assume it gives the following result :
The reader might be surprised to find events named click in this result. However remember that the where clause only filter the roots (i.e the UserPoints). To retrieve only the display event, one need to add an @filter clause :
Assuming the same data, this query would produce the following result :
The @filter predicate also filters by default empty result in its scope. To illustrate this, we reduce our query to retrieve only the score fields :
Still the same data, the result is the following:
You'll notice that two display
events have disappeared. Since they don't have a score, they would be empty object. Actually, this "filter empty" behavior which is true by default can be set using a second optional parameter to the @filter
directive. If we set it to false, we have the following result :
One might think that even the previous result has a lot of noise for only one score. You can add another @filter
to lighten the result. And since the clause
argument has also a default value of "true", you can write the following query
and get the following result :
The filter empty work as follow :
if an optional array is empty, we filter it
if an object is empty and is either optional or in an array we filter it
if an array is mandatory, we can filter it if the following conditions are met :
it is empty
the other mandatory selections of the parent object are only empty arrays
this parent object can be filtered (see above)
The @filter
directive cannot be used at the same time as aggregation operations such as @map, @date_histogram etc. (see hereunder).
Aggregation Operations
The aggregation operations are initiated by a directive in the SELECT
clause. They take into account the filter defined in the WHERE
clause, however they are not compatible with the @filter
directive that you can use in the SELECT
clause.
@count
This directive is used to count the number of objects verifying the query
Metrics directives
Fields used with metrics directives should have the @TreeIndex directive in your schema.
Those directives calculate a value per bucket created in the bucket directive, or with only one bucket containing all elements if you don't use bucket directives.
@avg
: average value for a specific field (only applies to numeric values)
@min
: minimum value for a specific field (only applies to numeric values)
@max
: maximum value for a specific field (only applies to numeric values)
@sum
: sum of value for a specific field (only applies to numeric values)
@cardinality
: count of distinct values
Bucket directives
Fields used with bucket directives should have the @TreeIndex directive in your schema.
Those directives separate values into buckets
@map
one bucket per field value
@map does not count null values
You can use the limit
function to extend the number or elements that can be returned by the directive (default: 50, maximum: 50 000) :
SELECT { name @map(limit:200) } FROM UserEvent
@histogram
aggregated count on a specific field. The interval can be modified regarding the business needs
@date_histogram
aggregated count by a period of an object associated with a date. Allowed intervals are1M
for a month andXXD
for a XX number of days.
Please note that you cannot use bucket directives with metrics.
Aliases
It is possible to add an alias to the field expression. This alias is then used in the output to identify the field result.
Managing queries
You usually enter OTQL queries directly in tools like the navigator. However, they can be saved and managed by code as objects. Some features will require you to link an object to an OTQL query, instead of just saving the query as text.
Features asking you to save queries to reference them usually want to leverage this for performances optimisation.
Creating a query
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/queries
Create an OTQL query in the platform before creating an export based on this query
Path Parameters
Name | Type | Description |
---|---|---|
datamartId | integer | The ID of the datamart |
Request Body
Name | Type | Description |
---|---|---|
Body | object | Payload |
Check a query
POST
https://api.mediarithmics.com/v1/datamarts/:datamartId/query_check/otql
Create an OTQL query in the platform before creating an export based on this query
Path Parameters
Name | Type | Description |
---|---|---|
datamartId | integer | The ID of the datamart |
Request Body
Name | Type | Description |
---|---|---|
Body | object | Payload |
Executing queries
You can execute queries in the different tools that mediarithmics offer, or using our API.
Execute a query
POST
https://api.mediarithmics.com/v1/datamarts/:datamart_id/query_executions/otql?use_cache=true
Executes an OTQL query on the specified datamart
Path Parameters
Name | Type | Description |
---|---|---|
datamart_id | integer | The ID of the datamart |
Query Parameters
Name | Type | Description |
---|---|---|
use_cache | boolean | Optimize the response time using the cache. |
Request Body
Name | Type | Description |
---|---|---|
Body | string | OTQL query to execute |
Query cache
When setting the use_cache
query parameter to TRUE
, the system returns the query from the cache if available.
To know if the returned value is from the cache or a new query execution, look at the cache_hit
property from the response. Its value is TRUE
if the response comes from the cache and FALSE
otherwise.
The cache expiration delay is 12 hour.
When not setting the use_cache
query parameter or setting its value to FALSE
, the cache system is skipped. The query will be executed and its value won't be stored. You can't use this to force a cache update .
Running the query SELECT ... FROM ... WHERE ts >= "now-1h"
(with a Date Math format from Date operators) will return the same result now, in five minutes and during the next hour if using the cache.
Queries optimization
Our engine tries to automatically optimize queries before running them. For example, a query with multiple OR
operators can use the IN
operator instead if it is better.
Last updated