Skip to content

Commit e2fb6d0

Browse files
jerrinotemrberknwoolmer
authored
Reference doc for ASOF JOIN TOLERANCE (#195)
Documents questdb/questdb#5713 Besides documenting the `TOLERANCE` keyword it also change the SQL HINTS page to reflect ASOF/LT JOINs noiw uses binary search by default. --------- Co-authored-by: Emre Berk Kaya <75899391+emrberk@users.noreply.github.com> Co-authored-by: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com>
1 parent 8db68e8 commit e2fb6d0

File tree

6 files changed

+397
-161
lines changed

6 files changed

+397
-161
lines changed

documentation/concept/sql-optimizer-hints.md

Lines changed: 73 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -5,58 +5,67 @@ description:
55
This document describes available hints and when to use them.
66
---
77

8-
QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. The default
9-
execution strategy should be the fastest for most datasets. You may use hints to select a specific execution strategy
10-
which may (or may not) outperform the default strategy. SQL hints influence the execution strategy of queries without
11-
changing their semantics.
8+
QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. While the default
9+
execution strategy should be the fastest for most scenarios, you can use hints to select a specific strategy that may
10+
better suit your data's characteristics. SQL hints influence the execution strategy of queries without changing their
11+
semantics.
1212

1313
## Hint Syntax
1414

1515
In QuestDB, SQL hints are specified as SQL block comments with a plus sign after the opening comment marker. Hints must
16-
be placed immediately after the SELECT keyword:
16+
be placed immediately after the `SELECT` keyword:
1717

1818
```questdb-sql title="SQL hint syntax"
1919
SELECT /*+ HINT_NAME(parameter1 parameter2) */ columns FROM table;
2020
```
2121

22-
Hints are entirely optional and designed to be a safe optimization mechanism:
22+
Hints are designed to be a safe optimization mechanism:
2323

24-
- The database will use default optimization strategies when no hints are provided
25-
- Syntax errors inside a hint block won't fail the entire SQL query
26-
- The database safely ignores unknown hints
27-
- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`)
24+
- The database uses default optimization strategies when no hints are provided.
25+
- Syntax errors inside a hint block won't fail the entire SQL query.
26+
- The database safely ignores unknown hints.
27+
- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`).
2828

29-
## Available Hints
29+
-----
3030

31-
### USE_ASOF_BINARY_SEARCH
31+
## Binary Search Optimizations and Hints
3232

33-
The `USE_ASOF_BINARY_SEARCH` hint enables a specialized binary search optimization for
34-
non-keyed [ASOF joins](/reference/sql/asof-join/) when filtering is applied to the joined table. This hint requires two
35-
parameters that specify the table aliases participating in the join.
33+
Since QuestDB 9.0.0, QuestDB's optimizer defaults to using a binary search-based strategy for **`ASOF JOIN`** and
34+
**`LT JOIN`** (Less Than Join) queries that have a filter on the right-hand side (the joined or lookup table). This
35+
approach is generally faster as it avoids a full table scan.
3636

37-
```questdb-sql title="Optimizing ASOF join with binary search"
38-
SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */
37+
However, for some specific data distributions and filter conditions, the previous strategy of performing a parallel full
38+
table scan can be more performant. For these cases, QuestDB provides hints to *avoid* the default binary search.
39+
40+
### AVOID\_ASOF\_BINARY\_SEARCH and AVOID\_LT\_BINARY\_SEARCH
41+
42+
These hints instruct the optimizer to revert to the pre-9.0 execution strategy for `ASOF JOIN` and `LT JOIN` queries,
43+
respectively. This older strategy involves performing a full parallel scan on the joined table to apply filters *before*
44+
executing the join.
45+
46+
- `AVOID_ASOF_BINARY_SEARCH(left_table_alias right_table_alias)`: Use for **`ASOF JOIN`** queries.
47+
- `AVOID_LT_BINARY_SEARCH(table_alias)`: Use for **`LT JOIN`** queries.
48+
49+
<!-- end list -->
50+
51+
```questdb-sql title="Avoiding binary search for an ASOF join"
52+
SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */
3953
orders.ts, orders.price, md.md_ts, md.bid, md.ask
4054
FROM orders
4155
ASOF JOIN (
4256
SELECT ts as md_ts, bid, ask FROM market_data
43-
WHERE state = 'VALID' --filter on the joined table
57+
WHERE state = 'INVALID' -- Highly selective filter
4458
) md;
4559
```
4660

4761
#### How it works
4862

49-
By default (without this hint), QuestDB processes ASOF joins by:
50-
51-
1. Applying filters to the joined table in parallel
52-
2. Joining the filtered results to the main table
53-
54-
With the `USE_ASOF_BINARY_SEARCH` hint, QuestDB changes the execution strategy:
63+
The **default strategy (binary search)** works as follows:
5564

56-
1. For each record in the main table, it uses [binary search](https://en.wikipedia.org/wiki/Binary_search) to locate
57-
a record with a matching timestamp in the joined table
58-
2. Starting from this located timestamp match, it then iterates backward through rows in the joined table, in a single
59-
thread, until finding a row that matches the filter condition
65+
1. For each record in the main table, it uses a binary search to quickly locate a record with a matching timestamp in
66+
the joined table.
67+
2. Starting from this located timestamp, it then iterates backward through rows in the joined table, in a single thread,
68+
evaluating the filter condition until a match is found.
6069

6170
<Screenshot
6271
alt="Diagram showing execution of the USE_ASOF_BINARY_SEARCH hint"
@@ -65,36 +74,39 @@ src="images/docs/concepts/asof-join-binary-search-strategy.svg"
6574
width={745}
6675
/>
6776

68-
#### When to use
77+
The **hinted strategy (`AVOID_..._BINARY_SEARCH`)** forces this plan:
6978

70-
This optimization is particularly beneficial when:
79+
1. Apply the filter to the *entire* joined table in parallel.
80+
2. Join the filtered (and now much smaller) result set to the main table.
7181

72-
- The joined table is significantly larger than the main table
73-
- The filter on the joined table has low selectivity (meaning it doesn't eliminate many rows)
74-
- The joined table data is likely to be "cold" (not cached in memory)
82+
#### When to use the AVOID hints
7583

76-
When joined table data is cold, the default strategy must read all rows from disk to evaluate the filter. This becomes
77-
especially expensive on slower I/O systems like EBS (Elastic Block Storage). The binary search approach significantly
78-
reduces I/O operations by reading only the specific portions of data needed for each join operation.
84+
You should only need these hints in a specific scenario: when the filter on your joined table is **highly selective**.
7985

80-
However, when a filter is highly selective (eliminates most rows), the binary search strategy may be less efficient.
81-
This happens because after finding a timestamp match, the strategy must iterate backward in a single thread, evaluating
82-
the filter condition at each step until it finds a matching row. With highly selective filters, this sequential search
83-
may need to examine many rows before finding a match.
86+
A filter is considered highly selective if it eliminates a very large percentage of rows (e.g., more than 95%). In this
87+
situation, the hinted strategy can be faster because:
8488

85-
As a rule of thumb, the binary search strategy tends to outperform the default strategy when the filter eliminates less
86-
than 5% of rows from the joined table. However, optimal performance also depends on other factors such as the ratio
87-
between main and joined table sizes, available hardware resources, disk I/O performance, and data distribution.
89+
- The parallel pre-filtering step rapidly reduces the joined table to a very small size.
90+
- The subsequent join operation is then very fast.
8891

89-
In contrast, the default strategy processes and filters the joined table in parallel, which can be much faster for
90-
highly selective filters despite requiring an initial full table scan.
92+
Conversely, the default binary search can be slower with highly selective filters because its single-threaded backward
93+
scan may have to check many rows before finding one that satisfies the filter condition.
9194

92-
#### Execution Plan Observation
93-
You can verify how QuestDB executes your query by examining its execution plan
94-
with the [`EXPLAIN` statement](/reference/sql/explain/):
95+
For most other cases, especially with filters that have low selectivity or when the joined table data is not in
96+
memory ("cold"), the default binary search is significantly faster as it minimizes I/O operations.
9597

96-
```questdb-sql title="Observing execution plan with USE_ASOF_BINARY_SEARCH"
97-
EXPLAIN SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */
98+
-----
99+
100+
### Execution Plan Observation
101+
102+
You can verify how QuestDB executes your query by examining its execution plan with the `EXPLAIN` statement.
103+
104+
#### Default Execution Plan (Binary Search)
105+
106+
Without any hints, a filtered `ASOF JOIN` will use the binary search strategy.
107+
108+
```questdb-sql title="Observing the default execution plan"
109+
EXPLAIN SELECT
98110
orders.ts, orders.price, md.md_ts, md.bid, md.ask
99111
FROM orders
100112
ASOF JOIN (
@@ -103,18 +115,20 @@ ASOF JOIN (
103115
) md;
104116
```
105117

106-
When the hint is applied successfully, the execution plan will show a Filtered AsOf Join Fast Scan operator,
107-
confirming that the binary search strategy is being used:
118+
The execution plan will show a `Filtered AsOf Join Fast Scan` operator, confirming the binary search strategy is being
119+
used.
108120

109121
<Screenshot
110-
alt="Screen capture of the EXPLAIN output for USE_ASOF_BINARY_SEARCH"
122+
alt="Screen capture of the EXPLAIN output showing the default Filtered AsOf Join Fast Scan"
111123
src="images/docs/concepts/filtered-asof-plan-example.png"
112124
/>
113125

114-
For comparison, here's what happens without the hint:
126+
#### Hinted Execution Plan (Full Scan)
127+
128+
When you use the `AVOID_ASOF_BINARY_SEARCH` hint, the plan changes.
115129

116-
```questdb-sql title="Observing execution plan without USE_ASOF_BINARY_SEARCH"
117-
EXPLAIN SELECT
130+
```questdb-sql title="Observing execution plan with the AVOID hint"
131+
EXPLAIN SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */
118132
orders.ts, orders.price, md.md_ts, md.bid, md.ask
119133
FROM orders
120134
ASOF JOIN (
@@ -123,12 +137,10 @@ ASOF JOIN (
123137
) md;
124138
```
125139

126-
The execution plan will show:
127-
128-
- A standard `AsOf Join` operator instead of `Filtered AsOf Join Fast Scan`
129-
- A separate filtering step that processes the joined table in parallel first
140+
The execution plan will now show a standard `AsOf Join` operator and a separate, preceding filtering step on the joined
141+
table.
130142

131143
<Screenshot
132-
alt="Screen capture of the EXPLAIN output for default ASOF join"
144+
alt="Screen capture of the EXPLAIN output for the hinted ASOF join, showing a separate filter"
133145
src="images/docs/concepts/default-asof-plan-example.png"
134146
/>

documentation/reference/sql/asof-join.md

Lines changed: 73 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,14 @@ Visualized, a JOIN operation looks like this:
3434
for more information.
3535

3636
- `joinClause` `ASOF JOIN` with an optional `ON` clause which allows only the
37-
`=` predicate:
37+
`=` predicate and an optional `TOLERANCE` clause:
3838

39-
![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg)
39+
![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofJoin.svg)
4040

4141
- `whereClause` - see the [WHERE](/docs/reference/sql/where/) reference docs for
4242
more information.
4343

44-
In addition, the following are items of import:
44+
In addition, the following are items of importance:
4545

4646
- Columns from joined tables are combined in a single row.
4747

@@ -67,6 +67,9 @@ logic: for each row in the first time-series,
6767
1. consider all timestamps in the second time-series **earlier or equal to**
6868
the first one
6969
2. choose **the latest** such timestamp
70+
3. If the optional `TOLERANCE` clause is specified, an additional condition applies:
71+
the chosen record from t2 must satisfy `t1.ts - t2.ts <= tolerance_value`. If no record
72+
from t2 meets this condition (along with `t2.ts <= t1.ts`), then the row from t1 will not have a match.
7073

7174
### Example
7275

@@ -151,7 +154,7 @@ Let's use an example with two tables:
151154
We want to join each trade event to the relevant order book snapshot. All
152155
we have to write is
153156

154-
```questdb-sql title="A basic ASOF JOIN example" demo
157+
```questdb-sql title="A basic ASOF JOIN example"
155158
trades ASOF JOIN order_book
156159
```
157160

@@ -411,21 +414,76 @@ To summarize:
411414
3. Use the `timestamp()` syntax as an expert-level hint to avoid a sort on a table with no designated timestamp, if and
412415
only if you are certain the data is already sorted.
413416

414-
### SQL performance hints for ASOF JOIN
417+
### TOLERANCE clause
415418

416-
QuestDB supports SQL hints that can optimize non-keyed ASOF join performance when filters are applied to the joined table:
419+
The `TOLERANCE` clause enhances ASOF and LT JOINs by limiting how far back in time the join should look for a match in the right
420+
table. The `TOLERANCE` parameter accepts a time interval value (e.g., `2s`, `100ms`, `1d`).
417421

418-
```questdb-sql title="ASOF JOIN with optimization hint"
419-
SELECT /*+ USE_ASOF_BINARY_SEARCH(trades order_book) */ *
420-
FROM trades
421-
ASOF JOIN (
422-
SELECT * FROM order_book
423-
WHERE state = 'VALID'
424-
) order_book;
422+
When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at
423+
t2.ts if both conditions are met: `t2.ts <= t1.ts` and `t1.ts - t2.ts <= tolerance_value`
424+
425+
This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within
426+
the specified time window.
427+
428+
```questdb-sql title="ASOF JOIN with a TOLERANCE parameter"
429+
SELECT ...
430+
FROM table1
431+
ASOF JOIN table2 TOLERANCE 10s
432+
[WHERE ...]
433+
```
434+
435+
TOLERANCE also works together with the ON clause:
436+
```questdb-sql title="ASOF JOIN with keys and a TOLERANCE parameter"
437+
SELECT ...
438+
FROM table1
439+
ASOF JOIN table2 ON (key_column) TOLERANCE 1m
440+
[WHERE ...]
441+
```
442+
443+
The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds), '2m' (
444+
2 minutes), '3h' (3 hours), or '1d' (1 day).
445+
446+
447+
#### Example using TOLERANCE:
448+
449+
Consider the `trades` and `order_book` tables from the previous examples. If we want to join trades to order book snapshots
450+
that occurred no more than 1 second before the trade:
451+
452+
```questdb-sql title="TOLERANCE example"
453+
SELECT t.timestamp, t.price, t.size, ob.timestamp AS ob_ts, ob.bid_price, ob.bid_size
454+
FROM trades t
455+
ASOF JOIN order_book ob TOLERANCE 1s;
425456
```
426457

427-
For more information on when and how to use these optimization hints, see the [SQL Hints](/concept/sql-optimizer-hints/)
428-
documentation.
458+
Let's analyze a specific trade: trades at `08:00:01.146931`.
459+
Without `TOLERANCE`, it joins with `order_book` at `08:00:01`. The time difference is 0.146931s.
460+
If we set `TOLERANCE` '100ms', this trade would not find a match, because 0.146931s (146.931ms) is greater than 100ms. The
461+
previous `order_book` entry at `08:00:00` would be even further away (1.146931s).
462+
463+
Another trade: trades at `08:00:00.007140`.
464+
Without `TOLERANCE`, it joins with order_book at `08:00:00`. The time difference is 0.007140s (7.14ms).
465+
If we set `TOLERANCE` '5ms', this trade would not find a match because 7.14ms > 5ms.
466+
467+
#### Supported Units for interval_literal
468+
The `TOLERANCE` interval literal supports the following time unit qualifiers:
469+
- U: Microseconds
470+
- T: Milliseconds
471+
- s: Seconds
472+
- m: Minutes
473+
- h: Hours
474+
- d: Days
475+
- w: Weeks
476+
477+
For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes,
478+
'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as
479+
units for the `TOLERANCE` clause.
480+
481+
#### Performance impact of TOLERANCE
482+
483+
Specifying `TOLERANCE` can also improve performance. `ASOF JOIN` execution plans often scan backward in time on the right
484+
table to find a matching entry for each left-table row. `TOLERANCE` allows these scans to terminate early - once a
485+
right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary
486+
processing of more distant records.
429487

430488
## SPLICE JOIN
431489

documentation/reference/sql/join.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ High-level overview:
3333
![Flow chart showing the syntax of the INNER, LEFT JOIN keyword](/images/docs/diagrams/InnerLeftJoin.svg)
3434

3535
- `ASOF`, `LT`, and `SPLICE` `JOIN` has optional `ON` clause allowing only the
36-
`=` predicate:
36+
`=` predicate.
37+
- `ASOF` and `LT` join additionally allows an optional `TOLERANCE` clause:
3738

3839
![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg)
3940

@@ -311,7 +312,7 @@ WHERE t.timestamp < t2.timestamp
311312

312313
## LT JOIN
313314

314-
Similar to `ASOF JOIN`, `LT JOIN` joins two different time-series measured. For
315+
Similar to [`ASOF JOIN`](/docs/reference/sql/asof-join/), `LT JOIN` joins two different time-series measured. For
315316
each row in the first time-series, the `LT JOIN` takes from the second
316317
time-series a timestamp that meets both of the following criteria:
317318

@@ -394,6 +395,42 @@ order to get preceding values for every row.
394395
The `ON` clause can also be used in combination with `LT JOIN` to join both by
395396
timestamp and column values.
396397

398+
### TOLERANCE clause
399+
The `TOLERANCE` clause enhances LT JOIN by limiting how far back in time the join should look for a match in the right
400+
table. The `TOLERANCE` parameter accepts a time interval value (e.g., 2s, 100ms, 1d).
401+
402+
When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at
403+
t2.ts if both conditions are met: `t2.ts < t1.ts` and `t1.ts - t2.ts <= tolerance_value`
404+
405+
This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within
406+
the specified time window.
407+
408+
```questdb-sql title="LT JOIN with a TOLERANCE parameter"
409+
SELECT ...
410+
FROM table1
411+
LT JOIN table2 TOLERANCE 10s
412+
[WHERE ...]
413+
```
414+
415+
The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds),
416+
'2m' ( 2 minutes), '3h' (3 hours), or '1d' (1 day).
417+
418+
#### Supported Units for interval_literal
419+
The `TOLERANCE` interval literal supports the following time unit qualifiers:
420+
- U: Microseconds
421+
- T: Milliseconds
422+
- s: Seconds
423+
- m: Minutes
424+
- h: Hours
425+
- d: Days
426+
- w: Weeks
427+
428+
For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes,
429+
'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as
430+
units for the `TOLERANCE` clause.
431+
432+
See [`ASOF JOIN documentation`](/docs/reference/sql/asof-join#tolerance-clause) for more examples with the `TOLERANCE` clause.
433+
397434
## SPLICE JOIN
398435

399436
`SPLICE JOIN` is a full `ASOF JOIN`. It will return all the records from both

0 commit comments

Comments
 (0)