Quantizing with Bits and Bytes Config?

#15
by KurianGKevin - opened
            quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
        ) 

Getting this error when inferencing

/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0 failed.

CUDA error: device-side assert triggered
Search for cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Is this compatible?

Swiss AI Initiative org

I understand from this that you're trying to use the Hugging Face bitsandbytes library, however your error report seems to be more of an issue with CUDA support. Are you able to do this with other models on your system? Please see the official documentation for further tips.

Sign up or log in to comment