Skip to content

Commit 05ff108

Browse files
Add cepo results
1 parent 1b3aa8a commit 05ff108

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,28 @@ Authorization: Bearer your_secret_api_key
362362

363363
![Results showing Mixture of Agents approach using gpt-4o-mini on Arena Hard Auto Benchmark](https://raw.githubusercontent.com/codelion/optillm/main/moa-results.png)
364364

365+
## CePO Results
366+
367+
### Comparison of CePO with default settings and base model
368+
369+
| Method | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX |
370+
| -------------------------- | ------- | --------------- | ---- | ---- |
371+
| Llama 3.3 70B | 51.0 | 78.6 | 49.1 | 72.6 |
372+
| Llama 3.1 405B | 49.8 | 79.2 | 50.7 | 73.0 |
373+
| CePO (using Llama 3.3 70B) | 69.6 | 84.8 | 55.5 | 80.1 |
374+
375+
### Ablation studies
376+
377+
| bestofn_n | planning_n | planning_m | bestofn_rating_type | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX | Comments |
378+
| --------- | ---------- | ---------- | ------------------- | ------- | --------------- | ----- | ----- | -------------- |
379+
| 3 | 3 | 6 | absolute | 69.6 | 84.8 | 55.5 | 80.1 | Default config |
380+
| 3 | 3 | 6 | pairwise | 67.7 | 83.5 | 55.6 | 79.8 | |
381+
| 3 | 2 | 5 | absolute | 67.1 | 85.1 | 55.1 | 79.0 | |
382+
| 3 | 5 | 8 | absolute | 69.4 | 84.3 | 55.6 | 81.1 | |
383+
| 5 | 3 | 6 | absolute | 68.7 | 85.4 | 54.8 | 79.9 | |
384+
| 7 | 3 | 6 | absolute | 69.6 | 82.8 | 54.7 | 78.4 | |
385+
| 9 | 3 | 6 | absolute | 68.9 | 83.4 | 55.7 | 80.6 | |
386+
365387
### optillm with Patchwork (July 2024)
366388

367389
Since optillm is a drop-in replacement for OpenAI API you can easily integrate it with existing tools and frameworks using the OpenAI client. We used optillm with [patchwork](https://github.com/patched-codes/patchwork) which is an open-source framework that automates development gruntwork like PR reviews, bug fixing, security patching using workflows

0 commit comments

Comments
 (0)