Skip to content

Conversation

@ivarflakstad
Copy link
Member

@ivarflakstad ivarflakstad commented Sep 20, 2025

Makes tensor backend generic. Will allow us to support any number of backends. With a few tweaks to the repo the backend can even be defined outside of the main repo - as opposed to the existing enum approach.

This PR introduces the traits BackendStorage, BackendDevice, and QuantizedBackend for quantization support.
A candle tensor now has the following definition

#[derive(Clone)]
pub struct Tensor<B>(Arc<Tensor_<B>>)
where
    B: BackendStorage;

Where previously the backend storage was provided by this enum

pub enum Storage {
    Cpu(CpuStorage),
    Cuda(CudaStorage),
    Metal(MetalStorage),
}

Now we instead implement BackendStorage for the three variants in the enum, and any new backends we would like to support in the future.

The original Storage is kept around and also implements BackendStorage and can be used just like before. All you have to do is specify your tensor as Tensor<Storage> and all code that depends on the inner enum etc will work as before. If you want to try transitioning to the new scheme try the backend of your choice like Tensor<CudaStorage>.

Original Storage is kept for now because

  1. It makes transitioning easier for projects that depend on candle.
  2. There is no easy or elegant way to use generics with pyo3 (ref issue).
    In my experience many people who use candle do it partly because they would like to not use python, so I'm not sure how much traction candle-pyo3 has. If it is not valuable to the community it would probably be better for the project as a whole to deprecate it. Deprecating the old Storage would be a logical next step, as we push users to use the new approach to backends.

@greenrazer
Copy link
Contributor

Some things I found for examples that use metal:

  • based
    • Metal conv1d BF16 not implemented
  • biet
    • Metal strided affine I64 not implemented
  • Deepseek
    • thread '<unnamed>' panicked at /Users/kb/Documents/webclones/candle/candle-transformers/src/models/deepseek2.rs:1071:37: called Result::unwrap()on anErrvalue: Metal error Could not lock kernel map: Command buffer map note: run withRUST_BACKTRACE=1environment variable to display a backtrace
  • Depth Anything has some bigger issues, but I'll work on that

@ivarflakstad
Copy link
Member Author

candle-transformers/src/models/deepseek2.rs:1071:37: called Result::unwrap()on anErrvalue: Metal error Could not lock kernel map: Command buffer map note: run withRUST_BACKTRACE=1environment variable to display a backtrace

Oh woops this was not supposed to be included in this PR. I was testing using rayon inside model code to launch several pieces simultaneously. Cut the initial loading time in half so we should explore, but not in this PR :)

Just for completeness the fix is simply to ensure the current thread actually has a command buffer

pub fn command_buffer(&mut self) -> Result<(bool, CommandBuffer), MetalKernelError> {
    let mut command_buffers = self.command_buffers.lock()?;
    let command_buffer = match command_buffers.get_mut(&thread::current().id()) {
        Some(command_buffer) => command_buffer,
        None => {
            let command_buffer = create_command_buffer(&self.command_queue)?;
            command_buffers.insert(thread::current().id(), command_buffer);
            command_buffers.get_mut(&thread::current().id()).unwrap()
        }
    };
    ...

ivarflakstad and others added 5 commits October 2, 2025 13:26
* got depth anything v2 example working

* fixed tensor::try_from -> tensor::new

* Depth anything v2 example: init device generically. Use anyhow result

---------

Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
@ivarflakstad ivarflakstad changed the base branch from main to v2 October 9, 2025 19:41
@ivarflakstad ivarflakstad changed the title [WIP] Generic Tensor<B> Generic Tensor<B> Oct 9, 2025
@ivarflakstad ivarflakstad marked this pull request as ready for review October 9, 2025 19:57
@ivarflakstad ivarflakstad merged commit df1a203 into v2 Oct 9, 2025
18 of 21 checks passed
@ivarflakstad ivarflakstad deleted the generic-tensor branch October 9, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants