@@ -24,7 +24,6 @@ Redis Test Apps → OpenTelemetry Collector → Prometheus → Grafana
2424| ` redis_operations_total ` | Counter | ` 1 ` | Total number of Redis operations executed | ` operation ` , ` status ` , ` app_name ` , ` instance_id ` , ` version ` , ` error_type ` | Records every Redis command |
2525| ` redis_operation_duration ` | Histogram | ` ms ` | Duration of Redis operations in milliseconds | ` operation ` , ` status ` , ` app_name ` , ` instance_id ` , ` version ` | Buckets: ` [0.1, 0.5, 1, 2, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000] ` |
2626| ` redis_connections_total ` | Counter | ` 1 ` | Total number of Redis connection attempts | ` status ` , ` app_name ` , ` instance_id ` , ` version ` | Tracks connection success/failure |
27- | ` redis_active_connections ` | Gauge | ` 1 ` | Current number of active Redis connections | ` app_name ` , ` instance_id ` , ` version ` | Current state, not cumulative |
2827| ` redis_reconnection_duration_ms ` | Histogram | ` ms ` | Duration of Redis reconnection attempts | ` app_name ` , ` instance_id ` , ` version ` | Buckets: ` [100, 500, 1000, 2000, 5000, 10000, 30000, 60000] ` |
2928
3029### Label Definitions
@@ -45,46 +44,11 @@ Redis Test Apps → OpenTelemetry Collector → Prometheus → Grafana
4544redis_operations_total{operation="SET", status="success", app_name="python-basic-rw", instance_id="abc123", version="1.0.0", error_type="none"} 1500
4645redis_operations_total{operation="GET", status="error", app_name="python-basic-rw", instance_id="abc123", version="1.0.0", error_type="timeout"} 5
4746
48- # Active connections gauge
49- redis_active_connections{app_name="python-basic-rw", instance_id="abc123", version="1.0.0"} 5
50-
5147# Connection attempts counter
5248redis_connections_total{status="success", app_name="python-basic-rw", instance_id="abc123", version="1.0.0"} 10
5349redis_connections_total{status="error", app_name="python-basic-rw", instance_id="abc123", version="1.0.0"} 2
5450```
5551
56- ## Standard Redis Operations
57-
58- All applications should support these standard Redis operations for consistent testing:
59-
60- | Category | Operation | Description | Use Case |
61- | ----------| -----------| -------------| ----------|
62- | ** Core** | ` SET ` | Set string value | Basic key-value operations |
63- | | ` GET ` | Get string value | Basic key-value operations |
64- | | ` DEL ` | Delete key | Cleanup and key management |
65- | | ` INCR ` | Increment integer value | Counters and atomic operations |
66- | | ` PING ` | Connection test | Health checks |
67- | ** Lists** | ` LPUSH ` | Push to left of list | Queue operations |
68- | | ` RPUSH ` | Push to right of list | Stack operations |
69- | | ` LPOP ` | Pop from left of list | Queue processing |
70- | | ` RPOP ` | Pop from right of list | Stack processing |
71- | | ` LRANGE ` | Get range from list | List inspection |
72- | | ` LLEN ` | Get list length | List size monitoring |
73- | ** Sets** | ` SADD ` | Add to set | Unique collections |
74- | | ` SREM ` | Remove from set | Set management |
75- | | ` SCARD ` | Get set cardinality | Set size monitoring |
76- | | ` SMEMBERS ` | Get all set members | Set inspection |
77- | ** Hashes** | ` HSET ` | Set hash field | Object storage |
78- | | ` HGET ` | Get hash field | Object field access |
79- | | ` HDEL ` | Delete hash field | Object field management |
80- | | ` HGETALL ` | Get all hash fields | Object inspection |
81- | ** Sorted Sets** | ` ZADD ` | Add to sorted set | Ranked collections |
82- | | ` ZREM ` | Remove from sorted set | Ranked management |
83- | | ` ZCARD ` | Get sorted set cardinality | Ranked size monitoring |
84- | | ` ZRANGE ` | Get range from sorted set | Ranked queries |
85- | ** Pub/Sub** | ` PUBLISH ` | Publish message | Message broadcasting |
86- | | ` SUBSCRIBE ` | Subscribe to channel | Message consumption |
87-
8852## Label Standards
8953
9054### App Name Convention
@@ -195,20 +159,20 @@ All metric queries should support these filters for consistent dashboard behavio
195159
196160## Example Grafana Queries
197161
198- ** Note on Time Ranges** : The ` [5m] ` in these queries represents a 5-minute time window for rate calculations. This provides:
199- - ** Smoothed metrics** : Reduces noise from short-term spikes
200- - ** Stable rates** : More reliable rate calculations over time
201- - ** Better visualization** : Smoother graphs in Grafana
162+ ** Note on Time Ranges** : The ` [10s] ` in these queries represents a 10-second time window for rate calculations. This provides:
163+ - ** Near real-time response** : Changes visible within 10 seconds
164+ - ** Good balance** : Responsive enough for monitoring while reducing noise
165+ - ** Fast failure detection** : Quickly shows when Redis goes down
166+
202167
203- For real-time monitoring, you can use shorter windows like ` [30s] ` or ` [1m] ` , but expect more volatile graphs.
204168
205169### Operations Rate by Status
206170``` promql
207- # 5-minute rate (recommended for stable visualization )
208- sum(rate(redis_operations_total{app_name=~"$app_name", instance_id=~"$instance_id", operation=~"$operation", version=~"$version"}[5m ])) by (operation, status)
171+ # 10-second rate (recommended for near real-time monitoring )
172+ sum(rate(redis_operations_total{app_name=~"$app_name", instance_id=~"$instance_id", operation=~"$operation", version=~"$version"}[10s ])) by (operation, status)
209173
210- # 1 -minute rate (more real-time, more volatile )
211- sum(rate(redis_operations_total{app_name=~"$app_name", instance_id=~"$instance_id", operation=~"$operation", version=~"$version"}[1m ])) by (operation, status)
174+ # 5 -minute rate (smoother, less responsive )
175+ sum(rate(redis_operations_total{app_name=~"$app_name", instance_id=~"$instance_id", operation=~"$operation", version=~"$version"}[5m ])) by (operation, status)
212176```
213177
214178### Average Latency by Operation
@@ -229,11 +193,6 @@ histogram_quantile(0.95, sum(rate(redis_operation_duration_bucket{app_name=~"$ap
229193histogram_quantile(0.95, sum(rate(redis_operation_duration_bucket{app_name=~"$app_name", instance_id=~"$instance_id", operation=~"$operation", version=~"$version"}[1m])) by (operation, le))
230194```
231195
232- ### Active Connections by App
233- ``` promql
234- redis_active_connections{app_name=~"$app_name", instance_id=~"$instance_id", version=~"$version"}
235- ```
236-
237196### Connection Success Rate
238197``` promql
239198# 5-minute success rate (recommended)
@@ -410,7 +369,6 @@ spec:
410369
4113703. **Incorrect Metric Types**
412371 - Use Counter for cumulative values (operations_total, connections_total)
413- - Use Gauge for current state (active_connections)
414372 - Use Histogram for distributions (duration metrics)
415373
4163744. **Performance Issues**
@@ -420,7 +378,7 @@ spec:
420378
421379### Validation Checklist
422380
423- - [ ] All 5 required metrics are implemented
381+ - [ ] All 4 required metrics are implemented
424382- [ ] All required labels are present and correctly formatted
425383- [ ] App name follows naming convention
426384- [ ] Instance ID is unique per application instance
0 commit comments