Instructions to use SandLogicTechnologies/Qwen3.5-2B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="SandLogicTechnologies/Qwen3.5-2B-GGUF",
	filename="Qwen3.5-2B_Q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SandLogicTechnologies/Qwen3.5-2B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SandLogicTechnologies/Qwen3.5-2B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Ollama
How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Ollama:
```
ollama run hf.co/SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SandLogicTechnologies/Qwen3.5-2B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SandLogicTechnologies/Qwen3.5-2B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for SandLogicTechnologies/Qwen3.5-2B-GGUF to start chatting

Pi new

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Docker Model Runner:
```
docker model run hf.co/SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M
```

Lemonade

How to use SandLogicTechnologies/Qwen3.5-2B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull SandLogicTechnologies/Qwen3.5-2B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-2B-GGUF-Q4_K_M

List all available models

lemonade list

SandLogicTechnologies commited on Mar 18

Commit

35746a2

verified ·

1 Parent(s): 395cbef

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +4 -0
Qwen3.5-2B_F16.gguf +3 -0
Qwen3.5-2B_Q4_k_m.gguf +3 -0
Qwen3.5-2B_Q5_k_m.gguf +3 -0
README.md +143 -0
mmproj-F16.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+mmproj-F16.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3.5-2B_F16.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3.5-2B_Q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3.5-2B_Q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text

Qwen3.5-2B_F16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9494e489554a5efe5d0ec28113753ac82ef19da7901447ced7d6fc59c5591a35
+size 3775709056

Qwen3.5-2B_Q4_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:516a5ce4d26131c7e8278b57fa3ee877ca85c98a8a2bbe8ff0fd52aa8cc4322b
+size 1270808448

Qwen3.5-2B_Q5_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b64287fb968709493e963aa025ff04d0775ba82061c5f5e53c48ba5914a7e16
+size 1424768896

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+---
+license: apache-2.0
+language:
+- en
+- zh
+base_model: Qwen/Qwen3.5-2B
+tags:
+  - image-text-to-text
+  - vision-language
+  - multimodal
+  - reasoning
+  - long-context
+  - multilingual
+  - lightweight
+---
+## Qwen3.5-2B
+Qwen3.5-2B is a compact vision-language model from the Qwen 3.5 series developed by Alibaba Cloud. The model is designed to handle multimodal inputs where images and text prompts can be combined to generate informative textual responses.
+With approximately 2 billion parameters, the model balances performance and efficiency, enabling multimodal reasoning and visual understanding while remaining suitable for deployment on modest hardware. The model can analyze images, diagrams, screenshots, and documents and produce contextual explanations or answers based on the provided prompt.
+The Qwen3.5 small series focuses on efficient models optimized for research, experimentation, and practical deployment scenarios where large models may be unnecessary or computationally expensive.
+---
+## Model Overview
+- **Model Name**: Qwen3.5-2B
+- **Base Model**: Qwen3.5-2B
+- **Architecture**: Multimodal Transformer (Vision Encoder + Language Model)
+- **Parameter Count**: ~2 Billion
+- **Context Window**: Up to ~256K tokens (implementation dependent)
+- **Modalities**: Image, Text
+- **Primary Languages**: English, Chinese, multilingual capability
+- **Developer**: Qwen (Alibaba Cloud)
+- **License**: Apache 2.0
+---
+## Quantization Details
+### FP16
+- Approx. ~65% size reduction compared to FP16
+- Very low memory footprint (~ 1.18 GB)
+- Highest fidelity to pretrained weights
+- Recommended for GPU inference and evaluation workloads
+### Q4_K_M
+- Approx. ~60% size reduction with higher fidelity (~ 1.33 GB)
+- Slightly larger size than Q4_K_M
+- Designed for efficient inference on consumer hardware
+- Compatible with CPU inference and low-VRAM GPUs
+---
+## Training Overview
+### Pretraining
+The model is pretrained on large-scale multimodal datasets containing paired image–text data together with extensive textual corpora. This training enables the model to learn strong associations between visual features and natural language representations.
+Training objectives include:
+- Visual–text alignment
+- Multimodal representation learning
+- Language modeling and reasoning
+- Cross-modal understanding
+### Optimization
+Additional optimization stages improve the model’s ability to perform multimodal tasks such as:
+- Visual question answering
+- Image caption generation
+- Scene and object recognition
+- Chart and document interpretation
+---
+## Core Capabilities
+- **Multimodal understanding**
+  Processes both image and text inputs to produce meaningful responses.
+- **Visual question answering**
+  Interprets visual content and answers questions about objects, scenes, or diagrams.
+- **Image captioning**
+  Generates descriptive captions explaining the contents of images.
+- **Image-grounded reasoning**
+  Performs reasoning tasks using information extracted from visual inputs.
+- **Multilingual interaction**
+  Supports multiple languages, with strong English and Chinese performance.
+- **Long-context processing**
+  Capable of handling extended inputs and longer multimodal conversations.
+---
+## Example Usage
+### llama.cpp
+```
+./llama-cli \
+  -m SandlogicTechnologies\Qwen3.5-2B_Q4_K_M.gguf \
+  -p "What is Knowledge Distillation?"
+```
+---
+## Recommended Use Cases
+- Multimodal conversational assistants
+- Visual question answering systems
+- Document and screenshot analysis
+- Chart and diagram interpretation
+- Image captioning and visual description
+- Educational tools using visual materials
+- Research involving multimodal reasoning
+- Rapid prototyping of multimodal AI applications
+---
+## Acknowledgments
+These quantized models are based on the original work by **Qwen** development team.
+Special thanks to:
+- The [Qwen](https://huggingface.co/Qwen) team for developing and releasing the [Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) model.
+- **Georgi Gerganov** and the `llama.cpp` community for enabling efficient inference using the GGUF format.
+---
+## Contact
+For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).

mmproj-F16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7035e9cb8d7c6a9681d07eef9a364783e86ea4cd73faab2eabb4f43a101830c7
+size 668227264