17 14 12

kas

shing3232

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

liked a Space 5 months ago

akhaliq/voxel-deepseek-terminus

liked a model 5 months ago

Aleph-Alpha/llama-tfree-hat-pretrained-7b-dpo

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Paper • 2602.04649 • Published 15 days ago • 12

liked a Space 5 months ago

Voxel Deepseek Terminus

🚀

Explore a voxel art pagoda garden

liked a model 5 months ago

Aleph-Alpha/llama-tfree-hat-pretrained-7b-dpo

7B • Updated Oct 22, 2025 • 64 • 10

New activity in deepseek-ai/DeepSeek-V3.1 6 months ago

tool call for reasoning mode

➕ 6

#27 opened 6 months ago by

shing3232

updated a collection 9 months ago

sakura

Collection

5 items • Updated May 15, 2025

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 9 months ago

Int4为什么比没量化的float32和float16还慢

#3 opened 12 months ago by

hujianmin

upvoted a paper 10 months ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11, 2025 • 57

updated a collection 10 months ago

sakura

Collection

5 items • Updated May 15, 2025

upvoted an article 10 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

•

275

upvoted a paper 10 months ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8, 2025 • 110

upvoted a paper 11 months ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26

liked 2 models 11 months ago

SakuraLLM/Sakura-GalTransl-7B-v3.7

8B • Updated Aug 15, 2025 • 37.7k • 90

webbigdata/ALMA-7B-Ja-V2

Text Generation • 7B • Updated Nov 3, 2024 • 683 • 20

New activity in agentica-org/DeepScaleR-1.5B-Preview 12 months ago

I have difficulty to trigger thinking process

#12 opened about 1 year ago by

shing3232

New activity in tencent/Tencent-Hunyuan-Large over 1 year ago

这个模型得什么配置能运行起来啊

#13 opened over 1 year ago by

demo001s

updated a model over 1 year ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

2B • Updated Nov 8, 2024 • 4.65k • 1

upvoted a collection over 1 year ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Dec 31, 2025 • 356

liked a model over 1 year ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • 9B • Updated Jul 1, 2024 • 2.33k • • 127

updated a model over 1 year ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

14B • Updated May 31, 2024 • 24 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF over 1 year ago

CUDA运行不了BF16模型？

#1 opened over 1 year ago by

NeuronAstate

kas

AI & ML interests

Recent Activity

Organizations

shing3232's activity

Voxel Deepseek Terminus

tool call for reasoning mode

Int4为什么比没量化的float32和float16还慢

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

I have difficulty to trigger thinking process

这个模型得什么配置能运行起来啊

CUDA运行不了BF16模型？