Skip to content

Commit 1b80b96

Browse files
author
Paweł Kędzia
committed
Merge branch 'features/lb'
2 parents e3122c7 + 500186e commit 1b80b96

File tree

1 file changed

+53
-7
lines changed

1 file changed

+53
-7
lines changed

llm_router_api/LB_STRATEGIES.md

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ The `llm-router` supports various strategies for selecting the most suitable pro
44
when multiple options exist for a given model. This ensures efficient
55
and reliable routing of requests. The available strategies are:
66

7+
---
8+
79
### 1. `balanced` (Default)
810

911
* **Description:** This is the default strategy. It aims to distribute requests
@@ -13,6 +15,8 @@ and reliable routing of requests. The available strategies are:
1315
in terms of capacity and performance. It provides a simple and effective way to balance the load.
1416
* **Implementation:** Implemented in `llm_router_api.core.lb.balanced.LoadBalancedStrategy`.
1517

18+
---
19+
1620
### 2. `weighted`
1721

1822
* **Description:** This strategy allows you to assign static weights to providers.
@@ -22,6 +26,8 @@ and reliable routing of requests. The available strategies are:
2226
characteristics, and you want to prioritize certain providers without needing dynamic adjustments.
2327
* **Implementation:** Implemented in `llm_router_api.core.lb.weighted.WeightedStrategy`.
2428

29+
---
30+
2531
### 3. `dynamic_weighted` (beta)
2632

2733
* **Description:** An extension of the `weighted` strategy. It not only uses weights
@@ -33,6 +39,8 @@ and reliable routing of requests. The available strategies are:
3339
configured weights and real-time performance metrics (latency).
3440
* **Implementation:** Implemented in `llm_router_api.core.lb.weighted.DynamicWeightedStrategy`.
3541

42+
---
43+
3644
### 4. `first_available`
3745

3846
* **Description:** This strategy selects the very first provider that is available.
@@ -47,16 +55,54 @@ and reliable routing of requests. The available strategies are:
4755
**When using the** `first_available` load balancing strategy, a **Redis server is required**
4856
for coordinating provider availability across multiple workers.
4957

50-
### 4. `first_available_optim`
51-
**UNDER DEVELOPMENT, DESCRIPTION WILL BE SOON**
58+
---
59+
60+
### 5. `first_available_optim`
61+
62+
**What it is**
63+
`first_available_optim` is an enhanced version of the plain *first‑available* load‑balancing strategy. It uses Redis to
64+
coordinate across multiple workers and tries to reuse a host that has already been used for the requested model before
65+
falling back to the classic “pick the first free provider” logic.
66+
67+
**How it works**
68+
69+
| Step | Purpose | Behaviour |
70+
|-------------------------------------------|------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
71+
| **1️⃣ Re‑use the last host** | If the model was previously run on a specific host and that host is currently free, select it. | The host identifier is stored in a Redis key `:last_host`. The strategy checks that the host is not occupied by another model and attempts an atomic acquisition of a provider on that host. |
72+
| **2️⃣ Re‑use any known host** | Prefer any host that already has the model loaded. | A Redis set `:hosts` tracks all hosts where the model is currently loaded. The strategy scans the provider list, picks a free provider on one of those hosts, and locks it atomically. |
73+
| **3️⃣ Pick an unused host** | Spread the load to a fresh host when no suitable “known” host is available. | It looks for a provider whose host is **not** present in the `:hosts` set and is not occupied, then acquires it. |
74+
| **4️⃣ Fallback to plain first‑available** | Guarantees a result even if the optimisation steps fail. | If none of the previous steps succeed, the strategy delegates to the base `FirstAvailableStrategy`, which simply selects the first free provider. |
75+
| **5️⃣ Book‑keeping** | Keep the optimisation data up‑to‑date for future requests. | After a provider is successfully acquired, the host is recorded as the *last host* (`:last_host`), added to the model‑specific host set (`:hosts`), and marked as occupied in a Redis hash `:occupancy`. |
76+
77+
**When to use it**
78+
79+
* **Low‑latency / high‑throughput workloads** – Re‑using the same host avoids the overhead of re‑loading a large model,
80+
resulting in faster responses.
81+
* **Environments with a limited number of hosts** – The strategy maximises the utilization of already‑occupied hosts
82+
while still allowing the load to be spread when needed.
83+
* **Multi‑worker deployments** – Because all state is stored in Redis, many processes (or even different machines) can
84+
safely share the optimisation logic without race conditions.
85+
86+
**Summary**
87+
`first_available_optim` combines the simplicity of the *first‑available* approach with smart host reuse, reducing
88+
model‑loading latency and improving overall throughput while still providing a reliable fallback mechanism. All
89+
coordination is performed via Redis, ensuring safe concurrent operation across multiple workers.
90+
91+
---
92+
93+
## Environments and Redis installation
5294

5395
The connection details for Redis can be configured using environment variables:
5496

55-
```shell
56-
LLM_ROUTER_BALANCE_STRATEGY="first_available" \
57-
LLM_ROUTER_REDIS_HOST="your.machine.redis.host" \
58-
LLM_ROUTER_REDIS_PORT=redis_port \
59-
```
97+
| Environment variable | Default | Description |
98+
|-------------------------------|-------------------|-------------------------------------------|
99+
| `LLM_ROUTER_BALANCE_STRATEGY` | `first_available` | Enables this strategy. |
100+
| `LLM_ROUTER_REDIS_HOST` || Hostname of the Redis server (mandatory). |
101+
| `LLM_ROUTER_REDIS_PORT` || Port of the Redis server (mandatory). |
102+
| `LLM_ROUTER_REDIS_DB` | `0` | Optional Redis database index. |
103+
| `LLM_ROUTER_REDIS_TIMEOUT` | `60` | Connection timeout in seconds. |
104+
105+
---
60106

61107
**Installing Redis on Ubuntu**
62108

0 commit comments

Comments
 (0)