-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
While implementing CPU Grouped-Query Attention, I encountered a limitation with broadcast_matmul and stride-0 dimensions. I'm taking a different approach (fused SIMD kernels) for my use case, but wanted to log this for others who may benefit from the fix.
Currently broadcast_as creates stride-0 view and matmul rejects stride-0 as "non-contiguous". So, I needed to use the workaround: .broadcast_as(...).contiguous() but this physically expands memory.
// GQA: Q has 16 heads, K/V have 8 heads (2 groups)
let q = ...; // [1, 8, 2, 2, 128]
let k = ...; // [1, 8, 1, 128, 2] ← size-1 dim should broadcast
let scores = q.broadcast_matmul(&k)?; // Error: "non-contiguous lhs"I expected it to broadcast dim 2 of rhs (1 → 2) during matmul
But instead I got an error after internal broadcast creates stride-0
ivarflakstad
Metadata
Metadata
Assignees
Labels
No labels