Skip to content

broadcast_matmul to handle stride-0 broadcast dimensions #3253

@DrJesseGlass

Description

@DrJesseGlass

While implementing CPU Grouped-Query Attention, I encountered a limitation with broadcast_matmul and stride-0 dimensions. I'm taking a different approach (fused SIMD kernels) for my use case, but wanted to log this for others who may benefit from the fix.

Currently broadcast_as creates stride-0 view and matmul rejects stride-0 as "non-contiguous". So, I needed to use the workaround: .broadcast_as(...).contiguous() but this physically expands memory.

// GQA: Q has 16 heads, K/V have 8 heads (2 groups)
let q = ...;  // [1, 8, 2, 2, 128]  
let k = ...;  // [1, 8, 1, 128, 2] ← size-1 dim should broadcast

let scores = q.broadcast_matmul(&k)?;  // Error: "non-contiguous lhs"

I expected it to broadcast dim 2 of rhs (1 → 2) during matmul
But instead I got an error after internal broadcast creates stride-0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions