Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Commit dca1b96

Browse files
author
Stephen Gutekanst
authored
self hosted models (#63899)
This PR is stacked on top of all the prior work @chrsmith has done for shuffling configuration data around; it implements the new "Self hosted models" functionality. ## Configuration Configuring a Sourcegraph instance to use self-hosted models basically involves adding some configuration like this to the site config (if you set `modelConfiguration`, you are opting in to the new system which is in early access): ``` // Setting this field means we are opting into the new Cody model configuration system. "modelConfiguration": { // Disable use of Sourcegraph's servers for model discovery "sourcegraph": null, // Create two model providers "providerOverrides": [ { // Our first model provider "mistral" will be a Huggingface TGI deployment which hosts our // mistral model for chat functionality. "id": "mistral", "displayName": "Mistral", "serverSideConfig": { "type": "huggingface-tgi", "endpoints": [{"url": "https://mistral.example.com/v1"}] }, }, { // Our second model provider "bigcode" will be a Huggingface TGI deployment which hosts our // bigcode/starcoder model for code completion functionality. "id": "bigcode", "displayName": "Bigcode", "serverSideConfig": { "type": "huggingface-tgi", "endpoints": [{"url": "http://starcoder.example.com/v1"}] } } ], // Make these two models available to Cody users "modelOverridesRecommendedSettings": [ "mistral::v1::mixtral-8x7b-instruct", "bigcode::v1::starcoder2-7b" ], // Configure which models Cody will use by default "defaultModels": { "chat": "mistral::v1::mixtral-8x7b-instruct", "fastChat": "mistral::v1::mixtral-8x7b-instruct", "codeCompletion": "bigcode::v1::starcoder2-7b" } } ``` More advanced configurations are possible, the above is our blessed configuration for today. ## Hosting models Another major component of this work is starting to build up recommendations around how to self-host models, which ones to use, how to configure them, etc. For now, we've been testing with these two on a machine with dual A100s: * Huggingface TGI (this is a Docker container for model inference, which provides an OpenAI-compatible API - and is widely popular) * Two models: * Starcoder2 for code completion; specifically `bigcode/starcoder2-15b` with `eetq` 8-bit quantization. * Mixtral 8x7b instruct for chat; specifically `casperhansen/mixtral-instruct-awq` which uses `awq` 4-bit quantization. This is our 'starter' configuration. Other models - specifically other starcoder 2, and mixtral instruct models - certainly work too, and higher parameter versions may of course provide better results. Documentation for how to deploy Huggingface TGI, suggested configuration and debugging tips - coming soon. ## Advanced configuration As part of this effort, I have added a quite extensive set of configuration knobs to to the client side model configuration (see `type ClientSideModelConfigOpenAICompatible` in this PR) Some of these configuration options are needed for things to work at a basic level, while others (e.g. prompt customization) are not needed for basic functionality, but are very important for customers interested in self-hosting their own models. Today, Cody clients have a number of different _autocomplete provider implementations_ which tie model-specific logic to enable autocomplete, to a provider. For example, if you use a GPT model through Azure OpenAI, the autocomplete provider for that is entirely different from what you'd get if you used a GPT model through OpenAI officially. This can lead to some subtle issues for us, and so it is worth exploring ways to have a _generalized autocomplete provider_ - and since with self-hosted models we _must_ address this problem, these configuration knobs fed to the client from the server are a pathway to doing that - initially just for self-hosted models, but in the future possibly generalized to other providers. ## Debugging facilities Working with customers in the past to use OpenAI-compatible APIs, we've learned that debugging can be quite a pain. If you can't see what requests the Sourcegraph backend is making, and what it is getting back.. it can be quite painful to debug. This PR implements quite extensive logging, and a `debugConnections` flag which can be turned on to enable logging of the actual request payloads and responses. This is critical when a customer is trying to add support for a new model, their own custom OpenAI API service, etc. ## Robustness Working with customers in the past, we also learned that various parts of our backend `openai` provider were not super robust. For example, [if more than one message was present it was a fatal error](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/openai.go#L305), or if the SSE stream yielded `{"error"}` payloads, they would go ignored. Similarly, the SSE event stream parser we use is heavily tailored towards [the exact response structure](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/decoder.go#L15-L19) which OpenAI's official API returns, and is therefor quite brittle if connecting to a different SSE stream. For this work, I have _started by forking_ our `internal/completions/client/openai` - and made a number of major improvements to it to make it more robust, handle errors better, etc. I have also replaced the usage of a custom SSE event stream parser - which was not spec compliant and brittle - with a proper SSE event stream parser that recently popped up in the Go community: https://github.com/tmaxmax/go-sse My intention is that after more extensive testing, this new `internal/completions/client/openaicompatible` provider will be more robust, more correct, and all around better than `internal/completions/client/openai` (and possibly the azure one) so that we can just supersede those with this new `openaicompatible` one entirely. ## Client implementation Much of the work done in this PR is just "let the site admin configure things, and broadcast that config to the client through the new model config system." Actually getting the clients to respect the new configuration, is a task I am tackling in future `sourcegraph/cody` PRs. ## Test plan 1. This change currently lacks any unit/regression tests, that is a major noteworthy point. I will follow-up with those in a future PR. * However, these changes are **incredibly** isolated, clearly only affecting customers who opt-in to this new self-hosted models configuration. * Most of the heavy lifting (SSE streaming, shuffling data around) is done in other well-tested codebases. 2. Manual testing has played a big role here, specifically: * Running a dev instance with the new configuration, actually connected to Huggingface TGI deployed on a remote server. * Using the new `debugConnections` mechanism (which customers would use) to directly confirm requests are going to the right places, with the right data and payloads. * Confirming with a new client (changes not yet landed) that autocomplete and chat functionality work. Can we use more testing? Hell yeah, and I'm going to add it soon. Does it work quite well and have small room for error? Also yes. ## Changelog Cody Enterprise: added a new configuration for self-hosting models. Reach out to support if you would like to use this feature as it is in early access. --------- Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
1 parent b4e03f4 commit dca1b96

File tree

18 files changed

+1070
-76
lines changed

18 files changed

+1070
-76
lines changed

cmd/frontend/internal/modelconfig/siteconfig.go

Lines changed: 73 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -173,13 +173,18 @@ func convertServerSideProviderConfig(cfg *schema.ServerSideProviderConfig) *type
173173
Endpoint: v.Endpoint,
174174
},
175175
}
176+
} else if v := cfg.HuggingfaceTgi; v != nil {
177+
return &types.ServerSideProviderConfig{
178+
OpenAICompatible: &types.OpenAICompatibleProviderConfig{
179+
Endpoints: convertOpenAICompatibleEndpoints(v.Endpoints),
180+
EnableVerboseLogs: v.EnableVerboseLogs,
181+
},
182+
}
176183
} else if v := cfg.Openaicompatible; v != nil {
177-
// TODO(slimsag): self-hosted-models: map this to OpenAICompatibleProviderConfig in the future
178184
return &types.ServerSideProviderConfig{
179-
GenericProvider: &types.GenericProviderConfig{
180-
ServiceName: types.GenericServiceProviderOpenAI,
181-
AccessToken: v.AccessToken,
182-
Endpoint: v.Endpoint,
185+
OpenAICompatible: &types.OpenAICompatibleProviderConfig{
186+
Endpoints: convertOpenAICompatibleEndpoints(v.Endpoints),
187+
EnableVerboseLogs: v.EnableVerboseLogs,
183188
},
184189
}
185190
} else if v := cfg.Sourcegraph; v != nil {
@@ -194,13 +199,57 @@ func convertServerSideProviderConfig(cfg *schema.ServerSideProviderConfig) *type
194199
}
195200
}
196201

202+
func convertOpenAICompatibleEndpoints(configEndpoints []*schema.OpenAICompatibleEndpoint) []types.OpenAICompatibleEndpoint {
203+
var endpoints []types.OpenAICompatibleEndpoint
204+
for _, e := range configEndpoints {
205+
endpoints = append(endpoints, types.OpenAICompatibleEndpoint{
206+
URL: e.Url,
207+
AccessToken: e.AccessToken,
208+
})
209+
}
210+
return endpoints
211+
}
212+
197213
func convertClientSideModelConfig(v *schema.ClientSideModelConfig) *types.ClientSideModelConfig {
198214
if v == nil {
199215
return nil
200216
}
201-
return &types.ClientSideModelConfig{
202-
// We currently do not have any known client-side model configuration.
217+
cfg := &types.ClientSideModelConfig{}
218+
if o := v.Openaicompatible; o != nil {
219+
cfg.OpenAICompatible = &types.ClientSideModelConfigOpenAICompatible{
220+
StopSequences: o.StopSequences,
221+
EndOfText: o.EndOfText,
222+
ContextSizeHintTotalCharacters: intPtrToUintPtr(o.ContextSizeHintTotalCharacters),
223+
ContextSizeHintPrefixCharacters: intPtrToUintPtr(o.ContextSizeHintPrefixCharacters),
224+
ContextSizeHintSuffixCharacters: intPtrToUintPtr(o.ContextSizeHintSuffixCharacters),
225+
ChatPreInstruction: o.ChatPreInstruction,
226+
EditPostInstruction: o.EditPostInstruction,
227+
AutocompleteSinglelineTimeout: uint(o.AutocompleteSinglelineTimeout),
228+
AutocompleteMultilineTimeout: uint(o.AutocompleteMultilineTimeout),
229+
ChatTopK: float32(o.ChatTopK),
230+
ChatTopP: float32(o.ChatTopP),
231+
ChatTemperature: float32(o.ChatTemperature),
232+
ChatMaxTokens: uint(o.ChatMaxTokens),
233+
AutoCompleteTopK: float32(o.AutoCompleteTopK),
234+
AutoCompleteTopP: float32(o.AutoCompleteTopP),
235+
AutoCompleteTemperature: float32(o.AutoCompleteTemperature),
236+
AutoCompleteSinglelineMaxTokens: uint(o.AutoCompleteSinglelineMaxTokens),
237+
AutoCompleteMultilineMaxTokens: uint(o.AutoCompleteMultilineMaxTokens),
238+
EditTopK: float32(o.EditTopK),
239+
EditTopP: float32(o.EditTopP),
240+
EditTemperature: float32(o.EditTemperature),
241+
EditMaxTokens: uint(o.EditMaxTokens),
242+
}
243+
}
244+
return cfg
245+
}
246+
247+
func intPtrToUintPtr(v *int) *uint {
248+
if v == nil {
249+
return nil
203250
}
251+
ptr := uint(*v)
252+
return &ptr
204253
}
205254

206255
func convertServerSideModelConfig(cfg *schema.ServerSideModelConfig) *types.ServerSideModelConfig {
@@ -213,6 +262,12 @@ func convertServerSideModelConfig(cfg *schema.ServerSideModelConfig) *types.Serv
213262
ARN: v.Arn,
214263
},
215264
}
265+
} else if v := cfg.Openaicompatible; v != nil {
266+
return &types.ServerSideModelConfig{
267+
OpenAICompatible: &types.ServerSideModelConfigOpenAICompatible{
268+
APIModel: v.ApiModel,
269+
},
270+
}
216271
} else {
217272
panic(fmt.Sprintf("illegal state: %+v", v))
218273
}
@@ -262,19 +317,14 @@ func convertModelCapabilities(capabilities []string) []types.ModelCapability {
262317
//
263318
// It would specify these equivalent options for them under `modelOverrides`:
264319
var recommendedSettings = map[types.ModelRef]types.ModelOverride{
265-
"bigcode::v1::starcoder2-3b": recommendedSettingsStarcoder2("bigcode::v1::starcoder2-3b", "Starcoder2 3B", "starcoder2-3b"),
266320
"bigcode::v1::starcoder2-7b": recommendedSettingsStarcoder2("bigcode::v1::starcoder2-7b", "Starcoder2 7B", "starcoder2-7b"),
267321
"bigcode::v1::starcoder2-15b": recommendedSettingsStarcoder2("bigcode::v1::starcoder2-15b", "Starcoder2 15B", "starcoder2-15b"),
268-
"mistral::v1::mistral-7b": recommendedSettingsMistral("mistral::v1::mistral-7b", "Mistral 7B", "mistral-7b"),
269322
"mistral::v1::mistral-7b-instruct": recommendedSettingsMistral("mistral::v1::mistral-7b-instruct", "Mistral 7B Instruct", "mistral-7b-instruct"),
270-
"mistral::v1::mixtral-8x7b": recommendedSettingsMistral("mistral::v1::mixtral-8x7b", "Mixtral 8x7B", "mixtral-8x7b"),
271-
"mistral::v1::mixtral-8x22b": recommendedSettingsMistral("mistral::v1::mixtral-8x22b", "Mixtral 8x22B", "mixtral-8x22b"),
272323
"mistral::v1::mixtral-8x7b-instruct": recommendedSettingsMistral("mistral::v1::mixtral-8x7b-instruct", "Mixtral 8x7B Instruct", "mixtral-8x7b-instruct"),
273324
"mistral::v1::mixtral-8x22b-instruct": recommendedSettingsMistral("mistral::v1::mixtral-8x22b", "Mixtral 8x22B", "mixtral-8x22b-instruct"),
274325
}
275326

276327
func recommendedSettingsStarcoder2(modelRef, displayName, modelName string) types.ModelOverride {
277-
// TODO(slimsag): self-hosted-models: tune these further based on testing
278328
return types.ModelOverride{
279329
ModelRef: types.ModelRef(modelRef),
280330
DisplayName: displayName,
@@ -285,15 +335,18 @@ func recommendedSettingsStarcoder2(modelRef, displayName, modelName string) type
285335
Tier: types.ModelTierEnterprise,
286336
ContextWindow: types.ContextWindow{
287337
MaxInputTokens: 8192,
288-
MaxOutputTokens: 4000,
338+
MaxOutputTokens: 4096,
339+
},
340+
ClientSideConfig: &types.ClientSideModelConfig{
341+
OpenAICompatible: &types.ClientSideModelConfigOpenAICompatible{
342+
StopSequences: []string{"<|endoftext|>", "<file_sep>"},
343+
EndOfText: "<|endoftext|>",
344+
},
289345
},
290-
ClientSideConfig: nil,
291-
ServerSideConfig: nil,
292346
}
293347
}
294348

295349
func recommendedSettingsMistral(modelRef, displayName, modelName string) types.ModelOverride {
296-
// TODO(slimsag): self-hosted-models: tune these further based on testing
297350
return types.ModelOverride{
298351
ModelRef: types.ModelRef(modelRef),
299352
DisplayName: displayName,
@@ -304,9 +357,10 @@ func recommendedSettingsMistral(modelRef, displayName, modelName string) types.M
304357
Tier: types.ModelTierEnterprise,
305358
ContextWindow: types.ContextWindow{
306359
MaxInputTokens: 8192,
307-
MaxOutputTokens: 4000,
360+
MaxOutputTokens: 4096,
361+
},
362+
ClientSideConfig: &types.ClientSideModelConfig{
363+
OpenAICompatible: &types.ClientSideModelConfigOpenAICompatible{},
308364
},
309-
ClientSideConfig: nil,
310-
ServerSideConfig: nil,
311365
}
312366
}

cmd/frontend/internal/modelconfig/siteconfig_completions.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,8 @@ func getProviderConfiguration(siteConfig *conftypes.CompletionsConfig) *types.Se
160160
Endpoint: siteConfig.Endpoint,
161161
}
162162

163-
// For all the other types of providers you can define in the site configuration, we
164-
// just use a generic config. Rather than creating one for Anthropic, Fireworks, Google, etc.
163+
// For all the other types of providers you can define in the legacy "completions" site configuration,
164+
// we just use a generic config. Rather than creating one for Anthropic, Fireworks, Google, etc.
165165
// We'll add those when needed, when we expose the newer style configuration in the site-config.
166166
default:
167167
serverSideConfig.GenericProvider = &types.GenericProviderConfig{

deps.bzl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6237,6 +6237,13 @@ def go_dependencies():
62376237
sum = "h1:ng9scYS7az0Bk4OZLvrNXNSAO2Pxr1XXRAPyjhIx+Fk=",
62386238
version = "v0.6.1",
62396239
)
6240+
go_repository(
6241+
name = "com_github_tmaxmax_go_sse",
6242+
build_file_proto_mode = "disable_global",
6243+
importpath = "github.com/tmaxmax/go-sse",
6244+
sum = "h1:pPpTgyyi1r7vG2o6icebnpGEh3ebcnBXqDWkb7aTofs=",
6245+
version = "v0.8.0",
6246+
)
62406247
go_repository(
62416248
name = "com_github_tmc_dot",
62426249
build_file_proto_mode = "disable_global",

go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,7 @@ require (
318318
github.com/sourcegraph/sourcegraph/lib v0.0.0-20240524140455-2589fef13ea8
319319
github.com/sourcegraph/sourcegraph/lib/managedservicesplatform v0.0.0-00010101000000-000000000000
320320
github.com/sourcegraph/sourcegraph/monitoring v0.0.0-00010101000000-000000000000
321+
github.com/tmaxmax/go-sse v0.8.0
321322
github.com/vektah/gqlparser/v2 v2.4.5
322323
github.com/vvakame/gcplogurl v0.2.0
323324
go.opentelemetry.io/collector/config/confighttp v0.103.0

go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2410,6 +2410,8 @@ github.com/tklauser/go-sysconf v0.3.12 h1:0QaGUFOdQaIVdPgfITYzaTegZvdCjmYO52cSFA
24102410
github.com/tklauser/go-sysconf v0.3.12/go.mod h1:Ho14jnntGE1fpdOqQEEaiKRpvIavV0hSfmBq8nJbHYI=
24112411
github.com/tklauser/numcpus v0.6.1 h1:ng9scYS7az0Bk4OZLvrNXNSAO2Pxr1XXRAPyjhIx+Fk=
24122412
github.com/tklauser/numcpus v0.6.1/go.mod h1:1XfjsgE2zo8GVw7POkMbHENHzVg3GzmoZ9fESEdAacY=
2413+
github.com/tmaxmax/go-sse v0.8.0 h1:pPpTgyyi1r7vG2o6icebnpGEh3ebcnBXqDWkb7aTofs=
2414+
github.com/tmaxmax/go-sse v0.8.0/go.mod h1:HLoxqxdH+7oSUItjtnpxjzJedfr/+Rrm/dNWBcTxJFM=
24132415
github.com/tmc/grpc-websocket-proxy v0.0.0-20190109142713-0ad062ec5ee5/go.mod h1:ncp9v5uamzpCO7NfCPTXjqaC+bZgJeR0sMTm6dMHP7U=
24142416
github.com/tomnomnom/linkheader v0.0.0-20180905144013-02ca5825eb80 h1:nrZ3ySNYwJbSpD6ce9duiP+QkD3JuLCcWkdaehUS/3Y=
24152417
github.com/tomnomnom/linkheader v0.0.0-20180905144013-02ca5825eb80/go.mod h1:iFyPdL66DjUD96XmzVL3ZntbzcflLnznH0fr99w5VqE=

internal/completions/client/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ go_library(
1717
"//internal/completions/client/fireworks",
1818
"//internal/completions/client/google",
1919
"//internal/completions/client/openai",
20+
"//internal/completions/client/openaicompatible",
2021
"//internal/completions/tokenusage",
2122
"//internal/completions/types",
2223
"//internal/httpcli",

internal/completions/client/client.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"github.com/sourcegraph/sourcegraph/internal/completions/client/fireworks"
1111
"github.com/sourcegraph/sourcegraph/internal/completions/client/google"
1212
"github.com/sourcegraph/sourcegraph/internal/completions/client/openai"
13+
"github.com/sourcegraph/sourcegraph/internal/completions/client/openaicompatible"
1314
"github.com/sourcegraph/sourcegraph/internal/completions/tokenusage"
1415
"github.com/sourcegraph/sourcegraph/internal/completions/types"
1516
"github.com/sourcegraph/sourcegraph/internal/httpcli"
@@ -64,6 +65,11 @@ func getAPIProvider(modelConfigInfo types.ModelConfigInfo) (types.CompletionsCli
6465
return client, errors.Wrap(err, "getting api provider")
6566
}
6667

68+
// OpenAI Compatible
69+
if openAICompatibleCfg := ssConfig.OpenAICompatible; openAICompatibleCfg != nil {
70+
return openaicompatible.NewClient(httpcli.UncachedExternalClient, *tokenManager), nil
71+
}
72+
6773
// The "GenericProvider" is an escape hatch for a set of API Providers not needing any additional configuration.
6874
if genProviderCfg := ssConfig.GenericProvider; genProviderCfg != nil {
6975
token := genProviderCfg.AccessToken
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
load("@io_bazel_rules_go//go:def.bzl", "go_library")
2+
3+
go_library(
4+
name = "openaicompatible",
5+
srcs = [
6+
"openaicompatible.go",
7+
"types.go",
8+
],
9+
importpath = "github.com/sourcegraph/sourcegraph/internal/completions/client/openaicompatible",
10+
visibility = ["//:__subpackages__"],
11+
deps = [
12+
"//internal/completions/tokenizer",
13+
"//internal/completions/tokenusage",
14+
"//internal/completions/types",
15+
"//internal/modelconfig/types",
16+
"//lib/errors",
17+
"@com_github_sourcegraph_log//:log",
18+
"@com_github_tmaxmax_go_sse//:go-sse",
19+
],
20+
)

0 commit comments

Comments
 (0)