Skip to content

Commit c4d6d1d

Browse files
authored
Merge pull request #107 from owndev/copilot/add-thinking-levels-budgets
Add thinking levels and budgets support for Gemini models
2 parents d2d3c13 + 3e30764 commit c4d6d1d

File tree

3 files changed

+303
-18
lines changed

3 files changed

+303
-18
lines changed

docs/google-gemini-integration.md

Lines changed: 136 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ This integration enables **Open WebUI** to interact with **Google Gemini** model
2626
> Streaming is automatically disabled for image generation models to prevent chunk size issues.
2727
2828
- **Thinking Support**
29-
Support reasoning and thinking steps, allowing models to break down complex tasks.
29+
Support reasoning and thinking steps, allowing models to break down complex tasks. Includes configurable thinking levels for Gemini 3 Pro ("low"/"high") and thinking budgets (0-32768 tokens) for other thinking-capable models.
30+
31+
> [!Note]
32+
> **Thinking Levels vs Thinking Budgets**: Gemini 3 Pro models use `thinking_level` ("low" or "high"), while other models like Gemini 2.5 use `thinking_budget` (token count). See [Gemini Thinking Documentation](https://ai.google.dev/gemini-api/docs/thinking) for details.
3033
3134
- **Multimodal Input Support**
3235
Accepts both text and image data for more expressive interactions with configurable image optimization.
@@ -123,6 +126,20 @@ GOOGLE_IMAGE_UPLOAD_FALLBACK=true
123126
# Default: true
124127
GOOGLE_INCLUDE_THOUGHTS=true
125128

129+
# Thinking budget for Gemini 2.5 models (not used for Gemini 3 models)
130+
# -1 = dynamic (model decides), 0 = disabled, 1-32768 = fixed token limit
131+
# Default: -1 (dynamic)
132+
# Note: Gemini 3 models use GOOGLE_THINKING_LEVEL instead
133+
GOOGLE_THINKING_BUDGET=-1
134+
135+
# Thinking level for Gemini 3 models only
136+
# Valid values: "low", "high", or empty string for model default
137+
# - "low": Minimizes latency and cost, suitable for simple tasks
138+
# - "high": Maximizes reasoning depth, ideal for complex problem-solving
139+
# Default: "" (empty, uses model default)
140+
# Note: This setting is ignored for non-Gemini 3 models
141+
GOOGLE_THINKING_LEVEL=""
142+
126143
# Enable streaming responses globally
127144
# Default: true
128145
GOOGLE_STREAMING_ENABLED=true
@@ -227,3 +244,121 @@ To use this filter, ensure it's enabled in your Open WebUI configuration. Then,
227244
## Native tool calling support
228245

229246
Native tool calling is enabled/disabled via the standard 'Function calling' Open Web UI toggle.
247+
248+
## Thinking Configuration
249+
250+
The Google Gemini pipeline supports advanced thinking configuration to control how much reasoning and computation is applied by the model.
251+
252+
> [!Note]
253+
> For detailed information about thinking capabilities, see the [Google Gemini Thinking Documentation](https://ai.google.dev/gemini-api/docs/thinking).
254+
255+
### Thinking Levels (Gemini 3 models)
256+
257+
Gemini 3 models support the `thinking_level` parameter, which controls the depth of reasoning:
258+
259+
- **`"low"`**: Minimizes latency and cost, suitable for simple tasks, chat, or high-throughput APIs.
260+
- **`"high"`**: Maximizes reasoning depth, ideal for complex problem-solving, code analysis, and agentic workflows.
261+
262+
> [!Note]
263+
> Gemini 3 models use `thinking_level` and do **not** use `thinking_budget`. The thinking budget setting is ignored for Gemini 3 models.
264+
265+
Set via environment variable:
266+
267+
```bash
268+
# Use low thinking level for faster responses
269+
GOOGLE_THINKING_LEVEL="low"
270+
271+
# Use high thinking level for complex reasoning
272+
GOOGLE_THINKING_LEVEL="high"
273+
```
274+
275+
**Example API Usage:**
276+
277+
```python
278+
from google import genai
279+
from google.genai import types
280+
281+
client = genai.Client()
282+
283+
response = client.models.generate_content(
284+
model="gemini-3-pro-preview",
285+
contents="Provide a list of 3 famous physicists and their key contributions",
286+
config=types.GenerateContentConfig(
287+
thinking_config=types.ThinkingConfig(thinking_level="low")
288+
),
289+
)
290+
291+
print(response.text)
292+
```
293+
294+
### Thinking Budget (Gemini 2.5 models)
295+
296+
For Gemini 2.5 models, you can control the maximum number of tokens used during internal reasoning using `thinking_budget`:
297+
298+
- **`0`**: Disables thinking entirely for fastest responses
299+
- **`-1`**: Dynamic thinking (model decides based on query complexity) - default
300+
- **`1-32768`**: Fixed token limit for reasoning
301+
302+
> [!Note]
303+
> Gemini 3 models do **not** use `thinking_budget`. Use `GOOGLE_THINKING_LEVEL` for Gemini 3 models instead.
304+
305+
Set via environment variable:
306+
307+
```bash
308+
# Disable thinking for fastest responses
309+
GOOGLE_THINKING_BUDGET=0
310+
311+
# Use dynamic thinking (model decides)
312+
GOOGLE_THINKING_BUDGET=-1
313+
314+
# Set a specific token budget for reasoning
315+
GOOGLE_THINKING_BUDGET=1024
316+
```
317+
318+
**Example API Usage:**
319+
320+
```python
321+
from google import genai
322+
from google.genai import types
323+
324+
client = genai.Client()
325+
326+
# Example with a specific thinking budget
327+
response = client.models.generate_content(
328+
model="gemini-2.5-pro",
329+
contents="Provide a list of 3 famous physicists and their key contributions",
330+
config=types.GenerateContentConfig(
331+
thinking_config=types.ThinkingConfig(thinking_budget=1024)
332+
),
333+
)
334+
print(response.text)
335+
336+
# Turn off thinking entirely
337+
response = client.models.generate_content(
338+
model="gemini-2.5-pro",
339+
contents="What is 2+2?",
340+
config=types.GenerateContentConfig(
341+
thinking_config=types.ThinkingConfig(thinking_budget=0)
342+
),
343+
)
344+
print(response.text)
345+
346+
# Use dynamic thinking (model decides based on query complexity)
347+
response = client.models.generate_content(
348+
model="gemini-2.5-pro",
349+
contents="Explain quantum computing",
350+
config=types.GenerateContentConfig(
351+
thinking_config=types.ThinkingConfig(thinking_budget=-1)
352+
),
353+
)
354+
print(response.text)
355+
```
356+
357+
### Model Compatibility
358+
359+
| Model | thinking_level | thinking_budget |
360+
|-------|---------------|-----------------|
361+
| gemini-3-* | ✅ Supported ("low", "high") | ❌ Not used |
362+
| gemini-2.5-* | ❌ Not used | ✅ Supported (0-32768) |
363+
| gemini-2.5-flash-image-* | ❌ Not supported | ❌ Not supported |
364+
| Other models | ❌ Not used | ✅ May be supported |

filters/vertex_ai_search_tool.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,3 @@ def inlet(self, body: dict) -> dict:
4141
"vertex_ai_search enabled but vertex_rag_store not provided in params or VERTEX_AI_RAG_STORE env var"
4242
)
4343
return body
44-

0 commit comments

Comments
 (0)