You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/google-gemini-integration.md
+136-1Lines changed: 136 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,10 @@ This integration enables **Open WebUI** to interact with **Google Gemini** model
26
26
> Streaming is automatically disabled for image generation models to prevent chunk size issues.
27
27
28
28
-**Thinking Support**
29
-
Support reasoning and thinking steps, allowing models to break down complex tasks.
29
+
Support reasoning and thinking steps, allowing models to break down complex tasks. Includes configurable thinking levels for Gemini 3 Pro ("low"/"high") and thinking budgets (0-32768 tokens) for other thinking-capable models.
30
+
31
+
> [!Note]
32
+
> **Thinking Levels vs Thinking Budgets**: Gemini 3 Pro models use `thinking_level` ("low" or "high"), while other models like Gemini 2.5 use `thinking_budget` (token count). See [Gemini Thinking Documentation](https://ai.google.dev/gemini-api/docs/thinking) for details.
30
33
31
34
-**Multimodal Input Support**
32
35
Accepts both text and image data for more expressive interactions with configurable image optimization.
# Note: Gemini 3 models use GOOGLE_THINKING_LEVEL instead
133
+
GOOGLE_THINKING_BUDGET=-1
134
+
135
+
# Thinking level for Gemini 3 models only
136
+
# Valid values: "low", "high", or empty string for model default
137
+
# - "low": Minimizes latency and cost, suitable for simple tasks
138
+
# - "high": Maximizes reasoning depth, ideal for complex problem-solving
139
+
# Default: "" (empty, uses model default)
140
+
# Note: This setting is ignored for non-Gemini 3 models
141
+
GOOGLE_THINKING_LEVEL=""
142
+
126
143
# Enable streaming responses globally
127
144
# Default: true
128
145
GOOGLE_STREAMING_ENABLED=true
@@ -227,3 +244,121 @@ To use this filter, ensure it's enabled in your Open WebUI configuration. Then,
227
244
## Native tool calling support
228
245
229
246
Native tool calling is enabled/disabled via the standard 'Function calling' Open Web UI toggle.
247
+
248
+
## Thinking Configuration
249
+
250
+
The Google Gemini pipeline supports advanced thinking configuration to control how much reasoning and computation is applied by the model.
251
+
252
+
> [!Note]
253
+
> For detailed information about thinking capabilities, see the [Google Gemini Thinking Documentation](https://ai.google.dev/gemini-api/docs/thinking).
254
+
255
+
### Thinking Levels (Gemini 3 models)
256
+
257
+
Gemini 3 models support the `thinking_level` parameter, which controls the depth of reasoning:
258
+
259
+
-**`"low"`**: Minimizes latency and cost, suitable for simple tasks, chat, or high-throughput APIs.
260
+
-**`"high"`**: Maximizes reasoning depth, ideal for complex problem-solving, code analysis, and agentic workflows.
261
+
262
+
> [!Note]
263
+
> Gemini 3 models use `thinking_level` and do **not** use `thinking_budget`. The thinking budget setting is ignored for Gemini 3 models.
264
+
265
+
Set via environment variable:
266
+
267
+
```bash
268
+
# Use low thinking level for faster responses
269
+
GOOGLE_THINKING_LEVEL="low"
270
+
271
+
# Use high thinking level for complex reasoning
272
+
GOOGLE_THINKING_LEVEL="high"
273
+
```
274
+
275
+
**Example API Usage:**
276
+
277
+
```python
278
+
from google import genai
279
+
from google.genai import types
280
+
281
+
client = genai.Client()
282
+
283
+
response = client.models.generate_content(
284
+
model="gemini-3-pro-preview",
285
+
contents="Provide a list of 3 famous physicists and their key contributions",
0 commit comments