-
Notifications
You must be signed in to change notification settings - Fork 206
feat(metrics): add scheduler attempt counter #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(metrics): add scheduler attempt counter #1931
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: googs1025 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/epp/metrics/metrics.go
Outdated
| // RecordSchedulingOutcome records metrics at the end of a scheduling attempt, | ||
| // including latency, attempt status. | ||
| func RecordSchedulingOutcome(duration time.Duration, err error) { | ||
| RecordSchedulerE2ELatency(duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this metric should be nested in here. Please remove this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
pkg/epp/scheduling/scheduler.go
Outdated
|
|
||
| defer func() { | ||
| metrics.RecordSchedulerE2ELatency(time.Since(scheduleStart)) | ||
| metrics.RecordSchedulingOutcome(time.Since(scheduleStart), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this line and the deleted line above with:
| metrics.RecordSchedulingOutcome(time.Since(scheduleStart), err) | |
| duration := time.Since(scheduleStart) | |
| metrics.RecordSchedulerE2ELatency(duration) | |
| metrics.RecordSchedulingOutcome(duration, err) |
f54475a to
6983812
Compare
Signed-off-by: CYJiang <googs1025@gmail.com>
6983812 to
1f791b7
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduce
SchedulerAttemptsTotalcounter with "success"/"failure" statuslabels. This improves observability of
the scheduler for monitoring, alerting, and debugging in production.
Which issue(s) this PR fixes:
Fixes None
Does this PR introduce a user-facing change?: