Skip to content

Conversation

@nnethercote
Copy link
Contributor

This is very rough quality draft code where I experimented with making t_dim fully const. There are 19 different t_dim values so this resulted in 19x duplication of a couple of functions. Overall it gave maybe a 1% perf win.

@nnethercote nnethercote marked this pull request as draft July 17, 2024 22:16
@rinon
Copy link
Collaborator

rinon commented Jul 17, 2024

This is actually a bit slower than #1325 on my 7700X - 5.549s for this PR vs 5.536s for #1325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants