SandLogicTechnologies commited on
Commit
35746a2
·
verified ·
1 Parent(s): 395cbef

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ mmproj-F16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3.5-2B_F16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3.5-2B_Q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3.5-2B_Q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3.5-2B_F16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9494e489554a5efe5d0ec28113753ac82ef19da7901447ced7d6fc59c5591a35
3
+ size 3775709056
Qwen3.5-2B_Q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:516a5ce4d26131c7e8278b57fa3ee877ca85c98a8a2bbe8ff0fd52aa8cc4322b
3
+ size 1270808448
Qwen3.5-2B_Q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b64287fb968709493e963aa025ff04d0775ba82061c5f5e53c48ba5914a7e16
3
+ size 1424768896
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: Qwen/Qwen3.5-2B
7
+ tags:
8
+ - image-text-to-text
9
+ - vision-language
10
+ - multimodal
11
+ - reasoning
12
+ - long-context
13
+ - multilingual
14
+ - lightweight
15
+ ---
16
+
17
+ ## Qwen3.5-2B
18
+
19
+ Qwen3.5-2B is a compact vision-language model from the Qwen 3.5 series developed by Alibaba Cloud. The model is designed to handle multimodal inputs where images and text prompts can be combined to generate informative textual responses.
20
+
21
+ With approximately 2 billion parameters, the model balances performance and efficiency, enabling multimodal reasoning and visual understanding while remaining suitable for deployment on modest hardware. The model can analyze images, diagrams, screenshots, and documents and produce contextual explanations or answers based on the provided prompt.
22
+
23
+ The Qwen3.5 small series focuses on efficient models optimized for research, experimentation, and practical deployment scenarios where large models may be unnecessary or computationally expensive.
24
+
25
+ ---
26
+
27
+ ## Model Overview
28
+
29
+ - **Model Name**: Qwen3.5-2B
30
+ - **Base Model**: Qwen3.5-2B
31
+ - **Architecture**: Multimodal Transformer (Vision Encoder + Language Model)
32
+ - **Parameter Count**: ~2 Billion
33
+ - **Context Window**: Up to ~256K tokens (implementation dependent)
34
+ - **Modalities**: Image, Text
35
+ - **Primary Languages**: English, Chinese, multilingual capability
36
+ - **Developer**: Qwen (Alibaba Cloud)
37
+ - **License**: Apache 2.0
38
+
39
+ ---
40
+
41
+ ## Quantization Details
42
+
43
+ ### FP16
44
+
45
+ - Approx. ~65% size reduction compared to FP16
46
+ - Very low memory footprint (~ 1.18 GB)
47
+ - Highest fidelity to pretrained weights
48
+ - Recommended for GPU inference and evaluation workloads
49
+
50
+ ### Q4_K_M
51
+
52
+ - Approx. ~60% size reduction with higher fidelity (~ 1.33 GB)
53
+ - Slightly larger size than Q4_K_M
54
+ - Designed for efficient inference on consumer hardware
55
+ - Compatible with CPU inference and low-VRAM GPUs
56
+
57
+ ---
58
+
59
+ ## Training Overview
60
+
61
+ ### Pretraining
62
+
63
+ The model is pretrained on large-scale multimodal datasets containing paired image–text data together with extensive textual corpora. This training enables the model to learn strong associations between visual features and natural language representations.
64
+
65
+ Training objectives include:
66
+
67
+ - Visual–text alignment
68
+ - Multimodal representation learning
69
+ - Language modeling and reasoning
70
+ - Cross-modal understanding
71
+
72
+ ### Optimization
73
+
74
+ Additional optimization stages improve the model’s ability to perform multimodal tasks such as:
75
+
76
+ - Visual question answering
77
+ - Image caption generation
78
+ - Scene and object recognition
79
+ - Chart and document interpretation
80
+
81
+ ---
82
+
83
+ ## Core Capabilities
84
+
85
+ - **Multimodal understanding**
86
+ Processes both image and text inputs to produce meaningful responses.
87
+
88
+ - **Visual question answering**
89
+ Interprets visual content and answers questions about objects, scenes, or diagrams.
90
+
91
+ - **Image captioning**
92
+ Generates descriptive captions explaining the contents of images.
93
+
94
+ - **Image-grounded reasoning**
95
+ Performs reasoning tasks using information extracted from visual inputs.
96
+
97
+ - **Multilingual interaction**
98
+ Supports multiple languages, with strong English and Chinese performance.
99
+
100
+ - **Long-context processing**
101
+ Capable of handling extended inputs and longer multimodal conversations.
102
+
103
+ ---
104
+
105
+ ## Example Usage
106
+
107
+ ### llama.cpp
108
+
109
+ ```
110
+ ./llama-cli \
111
+ -m SandlogicTechnologies\Qwen3.5-2B_Q4_K_M.gguf \
112
+ -p "What is Knowledge Distillation?"
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Recommended Use Cases
118
+
119
+ - Multimodal conversational assistants
120
+ - Visual question answering systems
121
+ - Document and screenshot analysis
122
+ - Chart and diagram interpretation
123
+ - Image captioning and visual description
124
+ - Educational tools using visual materials
125
+ - Research involving multimodal reasoning
126
+ - Rapid prototyping of multimodal AI applications
127
+
128
+ ---
129
+
130
+ ## Acknowledgments
131
+
132
+ These quantized models are based on the original work by **Qwen** development team.
133
+
134
+ Special thanks to:
135
+
136
+ - The [Qwen](https://huggingface.co/Qwen) team for developing and releasing the [Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) model.
137
+ - **Georgi Gerganov** and the `llama.cpp` community for enabling efficient inference using the GGUF format.
138
+
139
+ ---
140
+
141
+ ## Contact
142
+
143
+ For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).
mmproj-F16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7035e9cb8d7c6a9681d07eef9a364783e86ea4cd73faab2eabb4f43a101830c7
3
+ size 668227264