Skip to content

Commit 0d72da4

Browse files
author
Paweł Kędzia
committed
Merge branch 'features/quickstart'
2 parents 0c4e6c1 + 93e0cff commit 0d72da4

File tree

10 files changed

+870
-0
lines changed

10 files changed

+870
-0
lines changed

examples/README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
This directory contains example boilerplates that demonstrate how easy it is to integrate popular LLM libraries with the
44
router by simply switching the host.
55

6+
---
7+
68
## Available Examples
79

810
- **[LlamaIndex](llamaindex_example.py)** – Integration with LlamaIndex (GPT Index)
@@ -24,6 +26,8 @@ router will automatically:
2426
4. ✅ Supply monitoring and metrics
2527
5. ✅ Handle streaming and non‑streaming responses
2628

29+
---
30+
2731
## Quick Start
2832

2933
Each example can be run directly:
@@ -45,6 +49,8 @@ python examples/litellm_example.py
4549
python examples/haystack_example.py
4650
```
4751

52+
---
53+
4854
## Example Structure
4955

5056
Each example includes:
@@ -54,6 +60,25 @@ Each example includes:
5460
3. **Non‑streaming** – handling full responses
5561
4. **Error handling** – managing errors
5662

63+
---
64+
65+
## Full Stack with Local Models
66+
67+
The quick‑start guides for running the full stack with **local models** are included in the repository:
68+
69+
- **Gemma 3 12B‑IT**[README](quickstart/google-gemma3-12b-it/README.md)
70+
- **Bielik 11B‑v2.3‑Instruct‑FP8**[README](quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md)
71+
72+
These guides walk you through:
73+
74+
1. Installing **vLLM** and the respective model.
75+
2. Setting up **LLM‑Router** with the provided `models-config.json`.
76+
3. Testing the end‑to‑end flow (router → vLLM).
77+
78+
Follow the linked README files for step‑by‑step instructions to launch a complete stack locally.
79+
80+
---
81+
5782
## Additional Information
5883

5984
Learn more about the router:

examples/quickstart/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## Full Stack with Local Models
2+
3+
The quick‑start guides for running the full stack with **local models** are included in the repository:
4+
5+
- **Gemma 3 12B‑IT**[README](google-gemma3-12b-it/README.md)
6+
- **Bielik 11B‑v2.3‑Instruct‑FP8**[README](speakleash-bielik-11b-v2_3-Instruct/README.md)
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# 🚀 Quick‑Start Guide for `google/gemma-3-12b‑it` with **vLLM** & **LLM‑Router**
2+
3+
This guide walks you through:
4+
5+
1. **Installing vLLM** and the `google/gemma‑3‑12b‑it` model.
6+
2. **Installing LLM‑Router** (the API gateway).
7+
3. **Running the router** with the model configuration provided in `models-config.json`.
8+
9+
All commands assume you are working on a Unix‑like system (Linux/macOS) with **Python 3.10.6** and `virtualenv`
10+
available.
11+
12+
---
13+
14+
## 📋 Prerequisites
15+
16+
| Requirement | Details |
17+
|-------------|----------------------------------------------------------------------------------------|
18+
| **OS** | Ubuntu 20.04 + (or any recent Linux/macOS) |
19+
| **Python** | 3.10.6 (project’s default) |
20+
| **GPU** | CUDA 11.8 + (≥ 24 GB VRAM) **or** CPU‑only setup |
21+
| **Tools** | `git`, `curl`, `jq` (optional but handy for testing) |
22+
| **Network** | Ability to pull Docker images / PyPI packages and download the model from Hugging Face |
23+
24+
---
25+
26+
## 1️⃣ Set up a virtual environment
27+
28+
```shell script
29+
# Create a directory for the whole demo (optional)
30+
mkdir -p ~/gemma3-demo && cd $_
31+
32+
# Initialise the venv
33+
python3 -m venv .venv
34+
source .venv/bin/activate
35+
36+
# Upgrade pip (always a good idea)
37+
pip install --upgrade pip
38+
```
39+
40+
---
41+
42+
## 2️⃣ Install **vLLM** and download the Gemma 3 model
43+
44+
> **See the full step‑by‑step instructions in** [`VLLM.md`](./VLLM.md).
45+
46+
---
47+
48+
## 3️⃣ Run the **vLLM** server
49+
50+
Copy the helper script (or run the command manually) inside the demo directory:
51+
52+
```shell script
53+
# If you have the script `run-gemma-3-12b-it-vllm.sh` in the repo:
54+
cp path/to/llm-router/examples/quickstart/google-gemma3-12b-it/run-gemma-3-12b-it-vllm.sh .
55+
chmod +x run-gemma-3-12b-it-vllm.sh
56+
57+
# Start the server (you may want to use tmux/screen)
58+
./run-gemma-3-12b-it-vllm.sh
59+
```
60+
61+
The server will listen on **`http://0.0.0.0:7000`** and expose an OpenAI‑compatible endpoint at `/v1/chat/completions`.
62+
63+
You can quickly test it:
64+
65+
```shell script
66+
curl http://localhost:7000/v1/chat/completions \
67+
-H "Content-Type: application/json" \
68+
-d '{
69+
"model": "google/gemma-3-12b-it",
70+
"messages": [{"role": "user", "content": "Hello, how are you?"}],
71+
"max_tokens": 100
72+
}' | jq
73+
```
74+
75+
You should receive a JSON payload with the model’s generated text.
76+
77+
---
78+
79+
## 4️⃣ Install **LLM‑Router**
80+
81+
### Local install
82+
83+
```shell script
84+
# Clone the router repository (if you haven’t already)
85+
git clone https://github.com/radlab-dev-group/llm-router.git
86+
cd llm-router
87+
88+
# Install the core library + API wrapper (includes the REST server)
89+
pip install .[api]
90+
91+
# (Optional) Install Prometheus metrics support
92+
pip install .[api,metrics]
93+
```
94+
95+
> **Note:** The router uses the same virtual environment you created earlier, so all dependencies stay isolated.
96+
97+
[//]: # (### Docker based install)
98+
99+
---
100+
101+
## 5️⃣ Prepare the router configuration
102+
103+
The example repository already ships a [`models-config.json`](./models-config.json) that points to the locally running
104+
vLLM instance:
105+
106+
```json
107+
{
108+
"google_models": {
109+
"google/gemma-3-12b-it": {
110+
"providers": [
111+
{
112+
"id": "gemma3_12b-vllm-local:7000",
113+
"api_host": "http://localhost:7000/",
114+
"api_type": "vllm",
115+
"input_size": 56000,
116+
"weight": 1.0
117+
}
118+
]
119+
}
120+
},
121+
"active_models": {
122+
"google_models": [
123+
"google/gemma-3-12b-it"
124+
]
125+
}
126+
}
127+
```
128+
129+
Copy it (or edit the path) to the router’s `resources/configs/` directory:
130+
131+
```shell script
132+
mkdir -p resources/configs
133+
cp path/to/google-gemma3-12b-it/models-config.json resources/configs/
134+
```
135+
136+
---
137+
138+
## 6️⃣ Run the **LLM‑Router**
139+
140+
### Local Gunicorn
141+
142+
The helper script `run-rest-api-gunicorn.sh` sets a sensible default environment. You can use it directly or export the
143+
variables yourself.
144+
145+
```shell script
146+
# Make the script executable (if needed)
147+
chmod +x path/to/run-rest-api-gunicorn.sh
148+
149+
# Run the router
150+
./run-rest-api-gunicorn.sh
151+
```
152+
153+
Key environment variables (already defined in the script) you may want to adjust:
154+
155+
| Variable | Default | Meaning |
156+
|-------------------------------|----------------------------------------|--------------------------------------------|
157+
| `LLM_ROUTER_SERVER_TYPE` | `gunicorn` | Server backend (gunicorn, flask, waitress) |
158+
| `LLM_ROUTER_SERVER_PORT` | `8080` | Port on which the router listens |
159+
| `LLM_ROUTER_MODELS_CONFIG` | `resources/configs/models-config.json` | Path to the JSON file above |
160+
| `LLM_ROUTER_PROMPTS_DIR` | `resources/prompts` | Prompt‑template directory (optional) |
161+
| `LLM_ROUTER_BALANCE_STRATEGY` | `first_available` | Load‑balancing strategy |
162+
| `LLM_ROUTER_USE_PROMETHEUS` | `1` (if you installed metrics) | Enable `/api/metrics` endpoint |
163+
164+
After the script starts, the router will be reachable at **`http://0.0.0.0:8080/api`**.
165+
A full list of available environment variables can be found in
166+
the [environment description](../../../llm_router_api/README.md#environment-variables)
167+
---
168+
169+
## 7️⃣ Test the full stack (router → vLLM)
170+
171+
```shell script
172+
curl http://localhost:8080/api/v1/chat/completions \
173+
-H "Content-Type: application/json" \
174+
-d '{
175+
"model": "google/gemma-3-12b-it",
176+
"messages": [{"role": "user", "content": "Tell me a short joke."}],
177+
"max_tokens": 80
178+
}' | jq
179+
```
180+
181+
The request goes through **LLM‑Router**, which forwards it to the local vLLM server, and you receive the generated
182+
response.
183+
184+
---
185+
186+
## 🚀 Running the examples
187+
188+
The [`examples/`](../../../examples) folder already contains detailed README files and individual script doc‑strings
189+
that explain how each library (LangChain, LlamaIndex, OpenAI SDK, LiteLLM, Haystack) works with the LLM‑Router.
190+
191+
**What you need to do**
192+
193+
1. **Set the router address** – export `LLM_ROUTER_HOST` in the environment (or edit `examples/constants.py`) so that
194+
195+
```python
196+
HOST = "http://localhost:8080/api"
197+
```
198+
199+
matches the URL where you started the router (`run-rest-api-gunicorn.sh`).
200+
201+
2. **(Optional) Synchronise model names** – ensure the `MODELS` list in `constants.py` reflects the logical model
202+
identifiers you defined in `resources/configs/models-config.json`.
203+
204+
3. **Install the example dependencies**
205+
206+
```shell script
207+
pip install -r examples/requirements.txt
208+
```
209+
210+
4. **Run the examples** – each script can be executed directly, e.g.:
211+
212+
```shell script
213+
python examples/langchain_example.py
214+
python examples/llamaindex_example.py
215+
python examples/openai_example.py
216+
python examples/litellm_example.py
217+
python examples/haystack_example.py
218+
```
219+
220+
All other configuration details (prompt handling, streaming, multi‑model usage, error handling, etc.) are documented
221+
inside the individual example files and the [`examples/README.md`](../../README.md) / [
222+
`examples/README_LLAMAINDEX.md`](../../README_LLAMAINDEX.md) files. Adjust only the `HOST`
223+
(and optionally `MODELS`) and the examples will automatically route their requests through the running LLM‑Router.
224+
225+
---
226+
227+
## 🎉 What’s next?
228+
229+
- **Prometheus**: If you enabled metrics, add the router’s `/api/metrics` endpoint to your Prometheus scrape config.
230+
- **Guardrails & Masking**: Set the `LLM_ROUTER_FORCE_MASKING`, `LLM_ROUTER_FORCE_GUARDRAIL_REQUEST`, etc., to activate
231+
data‑protection plugins.
232+
- **Multiple providers**: Extend `models-config.json` with additional providers (e.g., Ollama, OpenAI) and experiment
233+
with different load‑balancing strategies.
234+
235+
---
236+
237+
Enjoy your local `gemma-3-12b‑it` deployment powered by vLLM and LLM‑Router!
238+
239+
---

0 commit comments

Comments
 (0)