Merge branch 'features/quickstart'

Paweł Kędzia · Paweł Kędzia · commit 05b60f1ab51c · 2025-12-01T02:52:08.000+01:00
diff --git a/README.md b/README.md
@@ -103,7 +103,7 @@ metrics for monitoring and alerting.
 The quick‑start guides for running the full stack with **local models** are included in the repository:
 
 - **Gemma 3 12B‑IT** – [README](examples/quickstart/google-gemma3-12b-it/README.md)
-- **Bielik 11B‑v2.3‑Instruct‑FP8** – [README](examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md)
+- **Bielik 11B‑v2.3‑Instruct** – [README](examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md)
 
 ### 2️⃣ Minimum required environment variable
 
diff --git a/examples/README.md b/examples/README.md
@@ -67,7 +67,7 @@ Each example includes:
 The quick‑start guides for running the full stack with **local models** are included in the repository:
 
 - **Gemma 3 12B‑IT** – [README](quickstart/google-gemma3-12b-it/README.md)
-- **Bielik 11B‑v2.3‑Instruct‑FP8** – [README](quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md)
+- **Bielik 11B‑v2.3‑Instruct** – [README](quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md)
 
 These guides walk you through:
 
diff --git a/examples/quickstart/README.md b/examples/quickstart/README.md
@@ -3,4 +3,4 @@
 The quick‑start guides for running the full stack with **local models** are included in the repository:
 
 - **Gemma 3 12B‑IT** – [README](google-gemma3-12b-it/README.md)
-- **Bielik 11B‑v2.3‑Instruct‑FP8** – [README](speakleash-bielik-11b-v2_3-Instruct/README.md)
+- **Bielik 11B‑v2.3‑Instruct** – [README](speakleash-bielik-11b-v2_3-Instruct/README.md)
diff --git a/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md b/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/README.md
@@ -1,8 +1,8 @@
-# 🚀 **Przewodnik Szybkiego Startu** dla `speakleash/Bielik-11B-v2.3-Instruct-FP8` z **vLLM** & **LLM‑Router**
+# 🚀 **Przewodnik Szybkiego Startu** dla `speakleash/Bielik-11B-v2.3-Instruct` z **vLLM** & **LLM‑Router**
 
 Ten przewodnik prowadzi Cię krok po kroku przez:
 
-1. **Instalację vLLM** i modelu `speakleash/Bielik-11B-v2.3-Instruct-FP8`.
+1. **Instalację vLLM** i modelu `speakleash/Bielik-11B-v2.3-Instruct`.
 2. **Instalację LLM‑Router** (bramki API).
 3. **Uruchomienie routera** z konfiguracją modeli dostarczoną w `models-config.json`.
 
@@ -65,7 +65,7 @@ Możesz szybko go przetestować:
 curl http://localhost:7000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-        "model": "speakleash/Bielik-11B-v2.3-Instruct-FP8",
+        "model": "speakleash/Bielik-11B-v2.3-Instruct",
         "messages": [{"role": "user", "content": "Cześć, jak się masz?"}],
         "max_tokens": 100
       }' | jq
@@ -101,7 +101,7 @@ Plik `models-config.json` znajdujący się w katalogu **speakleash‑bielik** ju
 ```json
 {
   "speakleash_models": {
-    "speakleash/Bielik-11B-v2.3-Instruct-FP8": {
+    "speakleash/Bielik-11B-v2.3-Instruct": {
       "providers": [
         {
           "id": "bielik-11B_v2_3-vllm-local:7000",
@@ -115,7 +115,7 @@ Plik `models-config.json` znajdujący się w katalogu **speakleash‑bielik** ju
   },
   "active_models": {
     "speakleash_models": [
-      "speakleash/Bielik-11B-v2.3-Instruct-FP8"
+      "speakleash/Bielik-11B-v2.3-Instruct"
     ]
   }
 }
@@ -163,7 +163,7 @@ Pełna lista dostępnych zmiennych środowiskowych znajduje się w
 curl http://localhost:8080/api/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-        "model": "speakleash/Bielik-11B-v2.3-Instruct-FP8",
+        "model": "speakleash/Bielik-11B-v2.3-Instruct",
         "messages": [{"role": "user", "content": "Opowiedz krótki żart."}],
         "max_tokens": 80
       }' | jq
diff --git a/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/VLLM.md b/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/VLLM.md
@@ -1,4 +1,4 @@
-# vLLM + `speakleash/Bielik-11B-v2.3-Instruct-FP8` – Przewodnik Szybkiego Startu (Ubuntu)
+# vLLM + `speakleash/Bielik-11B-v2.3-Instruct` – Przewodnik Szybkiego Startu (Ubuntu)
 
 > **Wymagania wstępne**
 > - Ubuntu 20.04 lub nowszy
@@ -51,12 +51,12 @@ pip install huggingface_hub
 
 ---  
 
-## 6️⃣ Pobierz model `speakleash/Bielik-11B-v2.3-Instruct-FP8`
+## 6️⃣ Pobierz model `speakleash/Bielik-11B-v2.3-Instruct`
 
 ```
-mkdir -p ./speakleash/Bielik-11B-v2.3-Instruct-FP8
-hf download speakleash/Bielik-11B-v2.3-Instruct-FP8 \
-    --local-dir ./speakleash/Bielik-11B-v2.3-Instruct-FP8
+mkdir -p ./speakleash/Bielik-11B-v2.3-Instruct
+hf download speakleash/Bielik-11B-v2.3-Instruct \
+    --local-dir ./speakleash/Bielik-11B-v2.3-Instruct
 ```
 
 > Model zostanie pobrany do wskazanego katalogu. Pliki będą także buforowane domyślnie w `~/.cache/huggingface/hub`.
@@ -91,12 +91,11 @@ bash run-bielik-11b-v2_3-vllm.sh
 
 > > **INFO**: `curl` i `jq` to narzędzia systemowe.
 
-
 ```
 curl http://localhost:7000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-        "model": "speakleash/Bielik-11B-v2.3-Instruct-FP8",
+        "model": "speakleash/Bielik-11B-v2.3-Instruct",
         "messages": [{"role": "user", "content": "Cześć, jak się masz?"}],
         "max_tokens": 100
       }' | jq
@@ -109,7 +108,7 @@ Powinieneś otrzymać odpowiedź w formacie JSON, np.:
   "id": "chatcmpl-xxxx",
   "object": "chat.completion",
   "created": 1764516430,
-  "model": "speakleash/Bielik-11B-v2.3-Instruct-FP8",
+  "model": "speakleash/Bielik-11B-v2.3-Instruct",
   "choices": [
     {
       "index": 0,
@@ -132,18 +131,18 @@ Powinieneś otrzymać odpowiedź w formacie JSON, np.:
 
 ## 9️⃣ Przydatne wskazówki
 
-| Temat                       | Rekomendacja                                                                                                                              |
-|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
-| **Pamięć**                  | `speakleash/Bielik-11B-v2.3-Instruct-FP8` potrzebuje ok. 12GB VRAM. Użyj `--cpu-offload` (jeśli wspierane) przy ograniczonej pamięci GPU. |
-| **Lokalizacja cache**       | Ustaw `HF_HOME=$PWD/.cache/huggingface`, aby wszystkie pliki modelu znajdowały się w katalogu projektu.                                   |
-| **Równoległość tokenizera** | `export TOKENIZERS_PARALLELISM=false` wyciszy ostrzeżenia tokenizera.                                                                     |
-| **Wybór GPU**               | `export CUDA_VISIBLE_DEVICES=0` (lub inny indeks) przy wielu kartach GPU.                                                                 |
-| **Aktualizacja**            | `pip install -U vllm` odświeża bibliotekę; przy następnym uruchomieniu serwera zostaną pobrane nowsze pliki modelu, jeśli są dostępne.    |
-| **Dezaktywacja**            | Po zakończeniu pracy wystarczy wpisać `deactivate`, aby opuścić wirtualne środowisko.                                                     |
+| Temat                       | Rekomendacja                                                                                                                           |
+|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
+| **Pamięć**                  | `speakleash/Bielik-11B-v2.3-Instruct` potrzebuje ok. 24GB VRAM. Użyj `--cpu-offload` (jeśli wspierane) przy ograniczonej pamięci GPU.  |
+| **Lokalizacja cache**       | Ustaw `HF_HOME=$PWD/.cache/huggingface`, aby wszystkie pliki modelu znajdowały się w katalogu projektu.                                |
+| **Równoległość tokenizera** | `export TOKENIZERS_PARALLELISM=false` wyciszy ostrzeżenia tokenizera.                                                                  |
+| **Wybór GPU**               | `export CUDA_VISIBLE_DEVICES=0` (lub inny indeks) przy wielu kartach GPU.                                                              |
+| **Aktualizacja**            | `pip install -U vllm` odświeża bibliotekę; przy następnym uruchomieniu serwera zostaną pobrane nowsze pliki modelu, jeśli są dostępne. |
+| **Dezaktywacja**            | Po zakończeniu pracy wystarczy wpisać `deactivate`, aby opuścić wirtualne środowisko.                                                  |
 
 ---  
 
 ## 🎉 Gotowe!
 
 Masz już w pełni działające API kompatybilne z OpenAI, oparte na **vLLM** i modelu
-**speakleash/Bielik-11B-v2.3-Instruct-FP8**.
+**speakleash/Bielik-11B-v2.3-Instruct**.
diff --git a/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/models-config.json b/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/models-config.json
@@ -1,6 +1,6 @@
 {
   "speakleash_models": {
-    "speakleash/Bielik-11B-v2.3-Instruct-FP8": {
+    "speakleash/Bielik-11B-v2.3-Instruct": {
       "providers": [
         {
           "id": "bielik-11B_v2_3-vllm-local:7000",
@@ -16,7 +16,7 @@
   },
   "active_models": {
     "speakleash_models": [
-      "speakleash/Bielik-11B-v2.3-Instruct-FP8"
+      "speakleash/Bielik-11B-v2.3-Instruct"
     ]
   }
 }
diff --git a/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/run-bielik-11b-v2_3-vllm.sh b/examples/quickstart/speakleash-bielik-11b-v2_3-Instruct/run-bielik-11b-v2_3-vllm.sh
@@ -2,7 +2,7 @@
 
 export CUDA_VISIBLE_DEVICES=0
 
-MODEL_PATH=speakleash/Bielik-11B-v2.3-Instruct-FP8
+MODEL_PATH=speakleash/Bielik-11B-v2.3-Instruct
 
 vllm serve \
 	"${MODEL_PATH}" \

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"speakleash_models": {`
`3`		`- "speakleash/Bielik-11B-v2.3-Instruct-FP8": {`
	`3`	`+ "speakleash/Bielik-11B-v2.3-Instruct": {`
`4`	`4`	`"providers": [`
`5`	`5`	`{`
`6`	`6`	`"id": "bielik-11B_v2_3-vllm-local:7000",`
`@@ -16,7 +16,7 @@`
`16`	`16`	`},`
`17`	`17`	`"active_models": {`
`18`	`18`	`"speakleash_models": [`
`19`		`- "speakleash/Bielik-11B-v2.3-Instruct-FP8"`
	`19`	`+ "speakleash/Bielik-11B-v2.3-Instruct"`
`20`	`20`	`]`
`21`	`21`	`}`
`22`	`22`	`}`