Skip to content

Conversation

@nabeelalam
Copy link
Contributor

@nabeelalam nabeelalam commented Nov 18, 2025

Description:

This PR adds pagination and rate-limit prevention for Dockerhub, Quay, and GHCR namespace discovery. This will ensure we fetch all images under a namespace while avoiding 429 Too Many Requests responses without missing results that span across pages.

Updates:

  • Implemented pagination for DockerHub (using next key), Quay (next_page), and GHCR (Link header).

  • Added golang.org/x/time/rate to prevent 429 responses by limiting a single request per 1.5 seconds.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@nabeelalam nabeelalam self-assigned this Nov 18, 2025
@nabeelalam nabeelalam requested a review from a team November 18, 2025 13:41
@nabeelalam nabeelalam requested review from a team as code owners November 18, 2025 13:41
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to keep most of the existing code as-is and simply introduce a fixed page size (e.g., 30) for all registry APIs. Then we can wrap the current logic for all registries in a loop that continues until next_page is empty, and use golang.org/x/time/rate to generically control the number of requests per second.

Even though it’s unlikely we’ll hit rate limits, since how many namespaces will have hundreds of images and this gives us a consistent safeguard so we don’t accidentally overload their APIs in any scenario.

@amanfcp amanfcp requested a review from a team as a code owner November 24, 2025 13:13
Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up! I think we should use the Retryable HTTP client from common: we should get the retry stuff for free, and we've been adding super useful metrics to it too.

Copy link
Contributor

@kashifkhan0771 kashifkhan0771 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Work! Overall looks fine to me. Just make sure to test it properly locally with popular organization images.

// registryRateLimiter limits how quickly we make registry API calls across all registries.
// We allow roughly 5 requests every ~7.5 seconds (one token every 1.5s) as a simple
// safeguard against overloading upstream APIs.
var registryRateLimiter = rate.NewLimiter(rate.Every(1500*time.Millisecond), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 1 event every 1.5s, burst of 2
var registryRateLimiter = rate.NewLimiter(rate.Limit(2.0/3.0), 2)

return nil, err
}
query := baseURL.Query()
query.Set("page_size", "100") // fetch images in batches of 100 per page
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the page size a constant and I would suggest to keep it maybe 30-40 per page but I'll leave that up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep it maximum, while it's unlikely to have that many images in a page, it still shouldn't hurt just in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants