Skip to content

Support --cluster-announce-hostname for DNS based clustering #1446

@darwin67

Description

@darwin67

Feature request type

enhancement

Is your feature request related to a problem? Please describe

Hi folks,

Currently Garnet cluster nodes have a problem when running on Kubernetes.
When using a statefulset to configure the cluster, a restart/termination of a pod will result in a stale node in the cluster without CleanClusterConfig enabled.

This is because pods in k8s are not guaranteed of an IP, and when cluster data is preserved via PVs, the reused node information will not match, causing the cluster to go into a weird state, where the restarted node is not actually recognized.

I believe Redis' gossip protocol will still resolve that over time when the IP is corrected, but this doesn't happen on Garnet, which is a bug of its own.

Describe the solution you'd like

Provide additional cluster announcement options as redis/valkey to allow using the hostname instead of the IP.
This will make operations in Kubernetes a lot easier since statefulsets guarantees hostname consistency, but not IPs.

Reference: https://raw.githubusercontent.com/redis/redis/7.0/redis.conf

# Clusters can configure their announced hostname using this config. This is a common use case for 
# applications that need to use TLS Server Name Indication (SNI) or dealing with DNS based
# routing. By default this value is only shown as additional metadata in the CLUSTER SLOTS
# command, but can be changed using 'cluster-preferred-endpoint-type' config. This value is 
# communicated along the clusterbus to all nodes, setting it to an empty string will remove 
# the hostname and also propagate the removal.
#
# cluster-announce-hostname ""

# Clusters can advertise how clients should connect to them using either their IP address,
# a user defined hostname, or by declaring they have no endpoint. Which endpoint is
# shown as the preferred endpoint is set by using the cluster-preferred-endpoint-type
# config with values 'ip', 'hostname', or 'unknown-endpoint'. This value controls how
# the endpoint returned for MOVED/ASKING requests as well as the first field of CLUSTER SLOTS. 
# If the preferred endpoint type is set to hostname, but no announced hostname is set, a '?' 
# will be returned instead.
#
# When a cluster advertises itself as having an unknown endpoint, it's indicating that
# the server doesn't know how clients can reach the cluster. This can happen in certain 
# networking situations where there are multiple possible routes to the node, and the 
# server doesn't know which one the client took. In this case, the server is expecting
# the client to reach out on the same endpoint it used for making the last request, but use
# the port provided in the response.
#
# cluster-preferred-endpoint-type ip

Additional benefit to this is TLS verification will also be somewhat easier using wildcards in the SANs.
Right now on MOVE responses, the target is <ip>:<port> and it's quite impossible to make sure certs are dynamically generated for each pod to include its IP address.

We're currently using this in the rueidis golang client

&tls.Config{
  ServerName: "garnet.example.com"
}

to force the TLS client to use the server name to work with this.

Describe alternatives you've considered

I've been setting CleanClusterConfig per this recommendation.
#640

But this requires quite some amount of coordination to make sure things work correctly.

  1. When a follower terminates, it needs to be removed from all cluster nodes via CLUSTER FORGET, and need to be reset via RESET HARD
  2. On boot of the new node, it needs to run MEET and then REPLICATE to get it to join the cluster again
  3. Now it needs to wait for the new follower to catch up on the leader

As you can see, this is 1. extremely time consuming, 2. requires a ton of coordination from an operator to make sure there are no stale nodes left in the cluster.

Supporting the 2 additional params will likely simplify cluster operations a lot.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions