Updated documentation of CePO

pawelf-cerebras · pawelf-cerebras · commit cbebb2cb1500 · 2025-01-23T15:06:20.000-08:00
diff --git a/README.md b/README.md
@@ -1,3 +1,27 @@
+# Cerebras Planning and Optimization (CePO)
+
+## Results
+
+### Comparison of CePO with default settings and base model
+
+| Method                     | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX |
+| -------------------------- | ------- | --------------- | ---- | ---- |
+| Llama 3.3 70B              |  51.0   |      78.6       | 49.1 | 72.6 |
+| Llama 3.1 405B             |  49.8   |      79.2       | 50.7 | 73.0 |
+| CePO (using Llama 3.3 70B) |  69.6   |      84.8       | 55.5 | 80.1 |
+
+### Ablation studies
+
+| bestofn_n | planning_n | planning_m | bestofn_rating_type | Math-L5 | MMLU-Pro (Math) | GPQA  | CRUX  | Comments       |
+| --------- | ---------- | ---------- | ------------------- | ------- | --------------- | ----- | ----- | -------------- |
+|     3     |      3     |      6     |       absolute      |  69.6   |      84.8       | 55.5  | 80.1  | Default config |
+|     3     |      3     |      6     |       pairwise      |  67.7   |      83.5       | 55.6  | 79.8  |                |
+|     3     |      2     |      5     |       absolute      |  67.1   |      85.1       | 55.1  | 79.0  |                |
+|     3     |      5     |      8     |       absolute      |  69.4   |      84.3       | 55.6  | 81.1  |                |
+|     5     |      3     |      6     |       absolute      |  68.7   |      85.4       | 54.8  | 79.9  |                |
+|     7     |      3     |      6     |       absolute      |  69.6   |      82.8       | 54.7  | 78.4  |                |
+|     9     |      3     |      6     |       absolute      |  68.9   |      83.4       | 55.7  | 80.6  |                |
+
 # optillm
 
 optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries. It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time.
@@ -362,28 +386,6 @@ Authorization: Bearer your_secret_api_key
 
 ![Results showing Mixture of Agents approach using gpt-4o-mini on Arena Hard Auto Benchmark](https://raw.githubusercontent.com/codelion/optillm/main/moa-results.png)
 
-## CePO Results
-
-### Comparison of CePO with default settings and base model
-
-| Method                     | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX |
-| -------------------------- | ------- | --------------- | ---- | ---- |
-| Llama 3.3 70B              |  51.0   |      78.6       | 49.1 | 72.6 |
-| Llama 3.1 405B             |  49.8   |      79.2       | 50.7 | 73.0 |
-| CePO (using Llama 3.3 70B) |  69.6   |      84.8       | 55.5 | 80.1 |
-
-### Ablation studies
-
-| bestofn_n | planning_n | planning_m | bestofn_rating_type | Math-L5 | MMLU-Pro (Math) | GPQA  | CRUX  | Comments       |
-| --------- | ---------- | ---------- | ------------------- | ------- | --------------- | ----- | ----- | -------------- |
-|     3     |      3     |      6     |       absolute      |  69.6   |      84.8       | 55.5  | 80.1  | Default config |
-|     3     |      3     |      6     |       pairwise      |  67.7   |      83.5       | 55.6  | 79.8  |                |
-|     3     |      2     |      5     |       absolute      |  67.1   |      85.1       | 55.1  | 79.0  |                |
-|     3     |      5     |      8     |       absolute      |  69.4   |      84.3       | 55.6  | 81.1  |                |
-|     5     |      3     |      6     |       absolute      |  68.7   |      85.4       | 54.8  | 79.9  |                |
-|     7     |      3     |      6     |       absolute      |  69.6   |      82.8       | 54.7  | 78.4  |                |
-|     9     |      3     |      6     |       absolute      |  68.9   |      83.4       | 55.7  | 80.6  |                |
-
 ### optillm with Patchwork (July 2024)
 
 Since optillm is a drop-in replacement for OpenAI API you can easily integrate it with existing tools and frameworks using the OpenAI client. We used optillm with [patchwork](https://github.com/patched-codes/patchwork) which is an open-source framework that automates development gruntwork like PR reviews, bug fixing, security patching using workflows
diff --git a/optillm/cepo.py b/optillm/cepo.py
@@ -9,21 +9,20 @@
 
 @dataclass
 class CepoConfig:
-    bestofn_n: int
-    bestofn_temperature: float
-    bestofn_max_tokens: int
-    bestofn_rating_type: Literal["absolute", "pairwise"]
-    planning_n: int
-    planning_m: int
-    planning_temperature_step1: float
-    planning_temperature_step2: float
-    planning_temperature_step3: float
-    planning_temperature_step4: float
-    planning_max_tokens_step1: int
-    planning_max_tokens_step2: int
-    planning_max_tokens_step3: int
-    planning_max_tokens_step4: int
-
+    bestofn_n: int  # number of responses to be generated in best of n stage
+    bestofn_temperature: float  # temperature for verifier in best of n stage
+    bestofn_max_tokens: int  # maximum number of tokens for verifier in best of n stage
+    bestofn_rating_type: Literal["absolute", "pairwise"]  # type of rating in best of n stage
+    planning_n: int  # number of plans generated in planning stage
+    planning_m: int  # number of attempts to generate n plans in planning stage
+    planning_temperature_step1: float  # temperature for generator in step 1 of planning stage
+    planning_temperature_step2: float  # temperature for generator in step 2 of planning stage
+    planning_temperature_step3: float  # temperature for generator in step 3 of planning stage
+    planning_temperature_step4: float  # temperature for generator in step 4 of planning stage
+    planning_max_tokens_step1: int  # maximum number of tokens in step 1 of planning stage
+    planning_max_tokens_step2: int  # maximum number of tokens in step 2 of planning stage
+    planning_max_tokens_step3: int  # maximum number of tokens in step 3 of planning stage
+    planning_max_tokens_step4: int  # maximum number of tokens in step 4 of planning stage
 
 # given command line arguments which includes a yaml file path, initialize a CePO configuration
 def init_cepo_config(cmd_line_args: dict) -> CepoConfig: