The Object Tree Query Language (OTQL) has been designed to help to search and calculate aggregates on a large collection of object trees. The object tree is defined in the schema using the @TreeIndexRoot and the @TreeIndex directives.
OTQL queries help you :
Build segments
Explore data
Monitor data integration
Check the volumetry
Build data exports
...
An OTQL query looks like an SQL query.
SELECT { id name } FROM Product WHERE price >50.0
It is composed of three parts:
A SELECTOperation: It gives indications on what needs to be done: extracting field values or calculating aggregates
A FROMstarting Object Type: It defines the starting object type in the evaluation process
A WHEREObject Tree Expression: It defines a logical expression mixing boolean operators and field operators to connect different objects in the object tree.
SELECT [Operation] FROM [Object Type] WHERE [Object Tree Expression]
There are two kinds of operations:
Selection Operations are similar to a GraphQL operation and return a list of objects containing the required fields.
Aggregation Operations return aggregated values (count, stats, histograms, ...) calculated on the selected objects.
Here are some examples of requests you can do with OTQL :
# Counts number of new users in the past 7daysSELECT @count{} FROM UserPoint WHERE creation_date >="now-7d/d"# Counts the number of transactions on a specific site (channel) 7days agoSELECT @count{} FROM UserEvent WHEREname="$transaction_confirmed"ANDdate="now-7d/d"AND channel_id =2419# Counts the number of profiles with female genderSELECT @count{} FROM UserProfile WHERE gender ="W"# Lists all categories from universes in events done on a specific channelSELECT { universe { category @map }} FROM UserEvent WHERE channel_id =2417# Lists all event names collected in the platformSELECT {name @map} FROM UserEvent# Number of users havingat least 3 events related to laptops in the past 15daysSELECT @count{} FROM UserPoint WHERE activities { events @ScoreSum(min:3) { category ="Laptop"ANDdate>="now-15d/d" }}# Number of transactions per siteand per daySELECT { channel_id @map { date @date_histogram } } FROM UserEvent WHEREname="$transaction_confirmed"# Number of users having an account but no emailsSELECT @count{} FROM UserPoint where accounts{} andnot emails{}
You could build queries starting from all UserPoint, all UserActivity, UserEvent, UserEmail or UserAccount
# Selects all names from all UserPointSELECT {name} FROM UserPoint# Selects all names from all UserActivitySELECT {name} FROM UserActivity# Equivalent ofSELECT {activities { name }} FROM UserPoint# Selects all names from all UserEventSELECT {name} FROM UserEvent# Equivalent ofSELECT { activities { events { name }}} FROM UserPoint
You target object types with the FROM. In the example, the link field is activities, but we don't do SELECT (...) FROM activities.
WHERE - Object tree expressions
The expression contained in the WHERE clause is composed of a list of predicates, separated by logical operators AND, OR, and NOT . Parenthesis can be used to group together two predicates with a logical operator.
Examples :
SELECT (...) FROM (...) WHERE (PredicateA AND PredicateB) OR PredicateCSELECT (...) FROM (...) WHERE PredicateA AND (PredicateB OR PredicateC)
SELECT (...) FROM (...) WHERE price >50.0
SELECT (...) FROM (...) WHERE price >50.0AND last_modified_date >"now-10d"
Logical operators
Each predicate doesn't return directly a boolean but a score, 1 if the condition is respected else 0. At the end the score is compared to 0 and return true if it's higher than 0 else return false. These operator keep the same priority as boolean ones. The logical operators work as below :
Logical operator
Real operation
Priority
NOT PredicateA
if ( ScoreA > 0) then 0 else 1
high
PredicateA AND PredicateB
ScoreA x ScoreB
middle
PredicateA OR PredicateB
max(ScoreA, ScoreB)
low
SELECT (...) FROM (...) WHERE price >50.0AND last_modified_date >"now-10d"# ( price >50.0 ) x ( last_modified_date >"now-10d" )SELECT (...) FROM (...) WHERE price >50.0AND last_modified_date >"now-10d"OR price >100.0AND last_modified_date >"now-20d"# (( price >50.0 ) x ( last_modified_date >"now-10d" )) + (( price >100.0 ) x ( last_modified_date >"now-20d" ))
Tree exploration
As we are querying an Object Tree, and as predicates are only possible on a leaf (e.g. fields that only contain a scalar value), it is natural to have a way of going from the root to each of the leaves by traversing the tree.
Braces symbols {} are used to traverse the tree through link fields. The sub-query written in the braces will be evaluated on each item in the linked list.
Let's see it in action.
Let's say we built a schema corresponding to the following Object Tree.
# Functional treeUserPoint└─ UserActivities └─ UserEvents# Associated schematypeUserPoint@TreeIndexRoot(index:"USER_INDEX") { # activities is a link field to UserActivity objects activities: [UserActivity]}typeUserActivity { # events is a link field to UserEvent objects events: [UserEvent]}typeUserEvent { name: String@TreeIndex(index_name: "USER_INDEX") amount: Int@TreeIndex(index_name: "USER_INDEX") date: Timestamp!}# A user point can have 0..n User Activity# Each UserActivity can have 0..m UserEvent# UserEvent has a "name" String field, an "amount" Int field and a "date" Timestamp field
The following query will return all the UserPoint that have at least one UserEvent whose name is $transaction_confirmed .
As the root of the Object Tree is the UserPoint in this example, we'll need to start from there. And then follow the activities link to the associated UserActivity and then the events link to the associated UserEvent.
SELECT (...) FROM UserPoint WHERE activities { events { name="$transaction_confirmed" } }
Scoring operator
The latest query returns items with at least one of the events have a $transaction_confirmed name. We return every user that did at least one purchase and at least one visit.
If we instead want the users that bought things through at least 3 different visits (frequent buyers), we will use a scoring operator.
Each time there is a pair of braces { } and a sub-query written in the braces, there is implicitly a score calculated for the sub-query. By default, the score will be the number of items matching the sub-query.
Only returns score if the nested sub-query has a score superior or equal to min.
args
description
min : Float
minimum required score for the nested sub-query to be returned
# Select user points that bought things through at least 3 different visitsSELECT (...) FROM UserPointWHERE activities @ScoreSum(min: 3.0){ events { name="$transaction_confirmed" } }# Select user points that have at least 1 activity that contains at least 3 $transaction_confirmed eventsSELECT (...) FROM UserPointWHERE activities { events @ScoreSum(min: 3.0) { name="$transaction_confirmed" } }
Calculate the average from the sub-query matching scores and returns true if it's superior or equal to min.
args
description
min : Float
minimum required score for the nested sub-query to be returned
By default, the score will be the number of items matching the sub-query. So using @ScoreMax or @ScoreAvg like that is useless because each score from the sub-query will be 1. It's why you should apply a modification on the score calculation.
Takes the maximum score from the sub-query matching scores and returns true if it's superior or equal to min.
args
description
min : Float
minimum required score for the nested sub-query to be returned
By default, the score will be the number of items matching the sub-query. So using @ScoreMax or @ScoreAvg like that is useless because each score from the sub-query will be 1. It's why you should apply a modification on the score calculation.
Changing the way scores are calculated
As said above, by default, score values are equal to the number of items matching a sub-query when following a link.
However, your Object Tree leaves have some number typed fields (Int or Float). It is possible to use those values as the score of a sub-query.
Select a specific field in which the numeric value used as the score is stored.
args
description
name: String
The name of the field selected
# Using @ScoreField alone is useless because it could be replaceby logical operatorSELECT (...) FROM UserPointWHERE activity { events @ScoreField(name:"amount") {name="$transaction_confirmed" }}# Can be writtenSELECT (...) FROM UserPointWHERE activity { events {name="$transaction_confirmed" }}
The information of which field is selected bubble up still it didn't catch by a @ScoreSum, @ScoreAvg or @ScoreMax .
# Select user points having spent at1000in one eventSELECT (...) FROM UserPointWHERE activity { events @ScoreField(name:"amount") @ScoreSum(min : 1000) { name="$transaction_confirmed" } }# Select user points having spent at1000in one activitySELECT (...) FROM UserPointWHERE activity @ScoreSum(min : 1000) { events @ScoreField(name:"amount") { name="$transaction_confirmed" } }
Be sure sub-field selected in @ScoreField exist in any field. Add a condition if the query return an error
SELECT (...) FROM UserPointWHERE activity @ScoreSum(min : 1000) { events @ScoreField(name:"amount") { name="$transaction_confirmed" }}# To be sure field "amount" exist SELECT (...) FROM UserPointWHERE activity @ScoreSum(min : 1000) { events @ScoreField(name:"amount") { is_defined(amount) ANDname="$transaction_confirmed" }}
Multiply the score by the factor. Can be used to boost a sub-query over another one.
args
description
factor: Float
Constant float which multiply the score
# Selet user pointhaving spent at least 1000 orders, where IT product count twice SELECT (...) FROM UserPointWHERE activity_events @ScoreSum(min : 1000) { order { order_products @ScoreBoost(factor: 2.0) @ScoreSum(result: "score_value") { category="IT" }, order_products @ScoreSum() { category!="IT" } }}
Using this calculated score
With the possibility to use score in a field, you may want to return the calculated score of a @ScoreSum and not only it a sub-query validate the condition or not. This is why we add a new parameters to @ScoreSum :result.
Only returns score if the nested sub-query has a score superior or equal to min.
args
description
min : Float
minimum required score for the nested sub-query to be returned
result : String
Two values possibles :
"boolean_value" (by default) : reduce the returned score value to 0 or 1. It has the same comportment as the previous one explain in conditional predicate.
"score_value" : return the real score value.
# Select user points having spent more than 1000€ through $transaction_confirmed events during the past yearSELECT (...) FROM UserPoint WHERE activities { events @ScoreField(name: "amount") @ScoreSum(min: 1000) { name="$transaction_confirmed"ANDdate>="now-1y/y" } }# Select user points having spent more than 1000€ incross orders withat least products which cost 10€: SELECT (...) FROM UserPointWHERE activity_events @ScoreSum(min : 1000) { order { order_products @ScoreField(name: "amount") @ScoreSum(min: 10, result:"score_value") { category="IT" } }}
Calculate the average from the sub-query matching scores and returns it if it's superior or equal to min args
args
description
min : Float
minimum required score for the nested sub-query to be returned
result : String
Two values possibles :
"boolean_value" (by default) : reduce the returned score value to 0 or 1.
"score_value" : return the real score value.
# Select user points having spent on average at least 1000 through $transaction_confirmed events during the past yearSELECT (...) FROM UserPoint WHERE activities { events @ScoreField(name: "amount") @ScoreAvg(min: 1000) { name="$transaction_confirmed"ANDdate>="now-1y/y" } }# Select user points having spent in average more than 1000€ by orders withat least products which cost 10€: SELECT (...) FROM UserPointWHERE activity_events @ScoreAvg(min : 1000) { order { order_products @ScoreField(name: "amount") @ScoreSum(min: 10, result:"score_value") { category="IT" } }}
Takes the maximum score from the sub-query matching scores and returns it if it's superior or equal to min
args
description
min : Float
minimum required score for the nested sub-query to be returned
result : String
Two values possibles :
"boolean_value" (by default) : reduce the returned score value to 0 or 1.
"score_value" : return the real score value.
# Select user points with at least 1 activity that contains at least 1 $transaction_confirmed event with amount >= 1000 during the past year
SELECT (...) FROM UserPoint WHERE activities { events @ScoreField(name: "amount") @ScoreMax(min: 1000) { name="$transaction_confirmed"ANDdate>="now-1y/y" } }
Go forward
It is possible to use these two ways at the same time but be careful, it is currently not possible to grow up a @ScoreField after a @ScoreSum(result: "boolean_score").
Example of possible use case :
# Select user points having spent in average more than 1000in one activity of events with a amount superior than 10SELECT (...) FROM UserPointWHERE activity @ScoreAvg(min : 1000) { events @ScoreField(name:"amount") @ScoreSum(min 10, result:"score_value") { name="$transaction_confirmed" } }# Select user points having spent in average in one activity, more than 1000 events with a amount superior than 10SELECT (...) FROM UserPointWHERE activity @ScoreAvg(min : 1000) { events @ScoreField(name:"amount") @ScoreSum(min: 10, result:"boolean_value") { name="$transaction_confirmed" } }
However the following use case can't be written :
# We wanted : # Count user points having spent more than 1000€ incross orders withat least 10 orders more than 100€ : SELECT (...) FROM UserPointWHERE activity_events @ScoreSum(min : 1000) { order @ScoreSum(min: 10, result: "score_value") { @ScoreSum(min: 100, result: "boolean_value") { order_products @ScoreField(name: "price") @ScoreSum(result:"score_value") { category="IT" } } }}# But the query return :# Count user points having spent more than 1000 orders withat least 10 orders more than 100€
Using conditional and scoring ways in the same query is useful for many use case whose won't be detailed here. A specific page has been created for regroup examples of them if you want to go forward.
Date operators
The following operators are available to work with dates :
>= Greater or equal
> Greater
<= Lower or equal
< Lower
= or == Equal
!= Not equal
Dates can be formatted either
in ISO8601 format (time part is optional) 2012-09-27, 2012-09-27T12:42:00
in a timestamp in milliseconds 1549365498507
in a Date Math format, defining a relative date
SELECT (...) FROM UserPoint WHERE activities {creation_ts <="2012-09-27"}SELECT (...) FROM UserPoint WHERE activities {creation_ts >"1549365498507"}SELECT (...) FROM UserPoint WHERE activities {creation_ts >"now-7d"}
Date Math format
The idea of the date match syntax is to define a relative date compared to an anchor date. The anchor date is either now or a date (ISO8601 or timestamp format) followed by ||.
The expression begins with the anchor date and is followed by one or more math expressions :
+1h adds one hour
-1d substracts one day
/d rounds down to the nearest day
Example, assuming now is 2001-01-01 12:00:00 :
now+1h // Resolves to: 2001-01-01 13:00:00now-1h // Resolves to: 2001-01-01 11:00:00now-1h/d // Resolves to: 2001-01-01 00:00:002001.02.01||+1M/d // Resolves to: 2001-03-01 00:00:00
Returns true if a word of the text contained in fieldName matches a word contained in comparisonValue.
starts_with(fieldName, comparisonValue)
Returns true if a word of the text contained in fieldName starts with one of the words contained in comparisonValue.
The same transformation is done on the text data before storage is also done on the comparisonValue.
# Doing(...) WHEREmatch(url_as_text, "Hello World!")# Will search in the textvaluesfor words matching 'hello'or'world'https://www.hello.com/https://www.world.com/https://www.hello.com/world/(...)
With data_type: keyword :
All operators are case-sensitive. Values are considered as a single word. No transformation is made.
String operator
description
starts_with(fieldName, comparisonValue)
Returns true if the exact value contained in fieldName starts with the exact value passed in comparisonValue.
= or ==
Returns true if the exact value contained in fieldName is equal with the exact value passed in comparisonValue.
(...) WHERE starts_with(mykeyword, "Hello World!")(...) WHERE mykeyword =="Hello World!"
In operator
You can use the IN operator as a shortcut to filter on multiple values of the same field.
# Total sold in events for channel IDs 2456, 5489, 1426SELECT {events {basket { amount @sum}}} FROM UserActivity WHERE channel_id IN ["2456","5489","1426"]# Equivalent of WHERE channel_id ="2456"OR channel_id ="5489"OR channel_id ="1426"
This operator offers better performances than multiple OR operators.
Is_defined operator
This is used to evaluate the value of a field and check if it is defined or not. The predicate can be applied in any indexed field in the schema and return a boolean.
# Return all user points with a profileSELECT { id } FROM UserPoint WHERE is_defined(profiles) # Return all user points with an email in their profileSELECT { id } FROM UserPoint WHERE profiles{is_defined(email)}
Field Value
Return
myField = “example”
True
myField = [“example”]
True
myField = [null]
True
myField = ""
True
myField = [ ]
True
myField = null
False
myField = undefined
False
(NoField)
False
****
JOIN operations
You may want to add another list of predicates FROM various objects. To do so, use JOIN clause to mention another object right after the FROM/WHERE clauses. It's possible to apply multiple JOIN in the same query.
# Get the activity “$transaction_confirmed” of user point of the segment id “1234“SELECT { id }FROM ActivityEvent WHEREname=="$transaction_confirmed"JOIN Userpoint WHERE segments { id="1234" }
LIMIT operations
# Get only 5or fewer activities named “$transaction_confirmed”SELECT { id }FROM ActivityEvent WHEREname=="$transaction_confirmed"LIMIT5
In the Query Tools, we return 10 elements by default but you can easily override this by using the LIMIT clause followed by the number of elements required:
SELECT { id }FROM ActivityEvent WHEREname=="$transaction_confirmed"LIMIT100# This query returns100 (or fewer) activities named “$transaction_confirmed”
It's not possible to return more than 10 000 elements with LIMIT. The query will fail if you try it.
Note that the LIMIT clause will be ignored when using @count or @cardinality directives.
SELECT @count{ }FROM ActivityEvent WHEREname=="$transaction_confirmed"LIMIT5# Return the count of all activities named "$transaction_confirmed"# Example: Return-21,866,076
LIMIT operator isn't applied during the segment calculation.
If you create a User Query Segment with a LIMIT, it will be ignored and return all user points who respect the WHERE clause.
If you're using an aggregation directive such as @map and you want to extend the number of buckets returned, you need to use the limit parameter of the aggregation directive (for instance: @map(limit:200)) and not the LIMIT clause. More details available hereunder.
SELECT operations
They are simply selecting fields. Every field present in the schema can be selected.
# Select id andnamein the rootlevelSELECT { id name } FROM UserPoint# Selectnamein UserPoint and creation_ts and id in emails linked to the user pointSELECT { name emails { creation_ts id } } FROM UserPoint
Filters
As we saw earler in this documentation, the WHERE expression gives you the ability to filter a sublist of objects at FROM level. You also have the capability to filter in or out the data returned by the query using the @filter directive in SELECT :
The user might be surprised to find "click" events in this result. However remember that the where clause only filter the roots (i.e the UserPoints). To retrieve only "display" events, the user will need to add an @filter clause as follow:
select { id { activities { id events @filter(clause: "name == \"display\"") { name score } } } }
from UserPoint
where { activities { events { name == "display" } } }
Assuming the same data, this query would produce the following result :
You'll notice that two display events have disappeared. Since they don't have a score, they would be empty object. Actually, this "filter empty" behavior can be set using a second optional parameter to the @filter directive. If we set it to false, we will obtain the following result :
One might think that the previous result still contains a lot of noise (4 events retrieved for only one score). You can add an extra @filter before the object name to lighten the result:
select @filter { { activities {
events @filter(clause: "name == \"display\"") { score } } } }
from UserPoint
where { activities { events { name == "display" } }}
The aggregation operations are initiated by a directive in the SELECT clause. They take into account the filter defined in the WHERE clause, however they are not compatible with the @filter directive that you can use in the SELECT clause.
@count
This directive is used to count the number of objects verifying the query
# Counts the number of user pointsSELECT @count {} FROM UserPoint# Counts number of new users in the past 7daysSELECT @count{} FROM UserPoint WHERE creation_date >="now-7d/d"
Metrics directives
Fields used with metrics directives should have the @TreeIndex directive in your schema.
Those directives calculate a value per bucket created in the bucket directive, or with only one bucket containing all elements if you don't use bucket directives.
@avg: average value for a specific field (only applies to numeric values)
# Average basket amount between two specific datesSELECT {basket {amount @avg}} FROM UserEvent WHERE {date>="2020-12-01"ANDdate<="2020-12-31" }
@min: minimum value for a specific field (only applies to numeric values)
# Minimum basket amount between two specific datesSELECT {basket {amount @min}} FROM UserEvent WHERE { date>="2020-06-20"ANDdate<="2020-06-25” }
@max: maximum value for a specific field (only applies to numeric values)
# Maximum basket amount between two specific dates
SELECT {basket {amount @max}} FROM UserEvent
WHERE { date >= "2020-06-20" AND date <= "2020-06-25”}
@sum: sum of value for a specific field (only applies to numeric values)
# Sum of basket amounts between two specific dates
SELECT {order{amount @sum }} FROM ActivityEvent
WHERE {date >= "2020-06-20" AND date <= "2020-06-25”}
@cardinality: count of distinct values
# Number of channels in a datamart
SELECT {channel_id @cardinality} FROM ActivityEvent
# Number of cookies associated with user points in a specific segment
SELECT {agents{id @cardinality}} FROM UserPoint
WHERE segments {id=”XXXX”}
Bucket directives
Fields used with bucket directives should have the @TreeIndex directive in your schema.
Those directives separate values into buckets
@map one bucket per field value
SELECT { channel_id @map { # map the values of channel id in several buckets
session_duration @avg # The average duration
}
} FROM UserActivity
# Data
# channel ID : 1234, count : 654987987987, session_duration: 100
# channel ID : 1235, count : 987987965465, session_duration: 1500
@map does not count null values
You can use the limit function to extend the number or elements that can be returned by the directive (default: 50, maximum: 50 000) :
SELECT { name @map(limit:200) } FROM UserEvent
@histogram aggregated count on a specific field. The interval can be modified regarding the business needs
SELECT { order { amount @histogram(interval:50)}}
FROM UserEvent WHERE date >= "now-7d"
# Data
# Key: 0, count: 97681
# Key: 50, count: 50324
# Key: 100, count: 33164
# Key: 150, count: 36528
@date_histogram aggregated count by a period of an object associated with a date. Allowed intervals are 1M for a month and XXD for a XX number of days.
# Mere use of @date_histogram directive: selecting all page_view events
# in the last 30 days
SELECT { date @date_histogram(interval:"1d") }
FROM UserEvent
WHERE name = "page_view" and date >= "now-30d/d"
# @date_histogram used together with @map directive with default interval (days)
SELECT { channel_id @map {date @date_histogram }}
FROM UserEvent
WHERE name = "$transaction_confirmed"
# Data
# Key: 2416, count: 27563351
# 2018-01-16T00:00:00.000Z 330
# 2018-01-17T00:00:00.000Z 331
# 2018-01-18T00:00:00.000Z 3332
# ...
# Key: 2417, count: 65498798
# ...
# Force an interval of one month
SELECT { channel_id @map {date @date_histogram(interval: "1M") } }
FROM UserEvent
WHERE name = "$transaction_confirmed"
Please note that you cannot use bucket directives with metrics.
Aliases
It is possible to add an alias to the field expression. This alias is then used in the output to identify the field result.
SELECT {
numberOfChannels: channel_id @cardinality # The approximate number of distinc values
averageDuration: duration @avg # The average duration
mininumDuration: duration @min # map the values of channel id in several buckets
} FROM UserEvent
Managing queries
You usually enter OTQL queries directly in tools like the navigator. However, they can be saved and managed by code as objects. Some features will require you to link an object to an OTQL query, instead of just saving the query as text.
Features asking you to save queries to reference them usually want to leverage this for performances optimisation.
Create an OTQL query in the platform before creating an export based on this query
Path Parameters
Name
Type
Description
datamartId
integer
The ID of the datamart
Request Body
Name
Type
Description
Body
object
Payload
{
"status": "ok",
"data":
{
"id": "50409", // ID of the query to retrieve for the next steps
"datamart_id": "1509",
"query_language": "OTQL",
"minor_version": null,
"major_version": null,
"query_text": "SELECT {id} FROM UserPoint",
"favorite": false
}
}
When setting the use_cache query parameter to TRUE, the system returns the query from the cache if available.
To know if the returned value is from the cache or a new query execution, look at the cache_hit property from the response. Its value is TRUE if the response comes from the cache and FALSE otherwise.
The cache expiration delay is 12 hour.
When not setting the use_cache query parameter or setting its value to FALSE, the cache system is skipped. The query will be executed and its value won't be stored. You can't use this to force a cache update .
Running the querySELECT ... FROM ... WHERE ts >= "now-1h"(with a Date Math format from Date operators) will return the same result now, in five minutes and during the next hour if using the cache.
Queries optimization
Our engine tries to automatically optimize queries before running them. For example, a query with multiple OR operators can use the IN operator instead if it is better.