Ken Tsui

kenhktsui

fibrosis's profile picture

Reza2kn's profile picture

Ramikan-BR's profile picture

https://kenhktsui.github.io/

kenhktsui
kenhktsui

AI & ML interests

ML engineer, researcher VLM, LLM benchmark Opinions are my own

Recent Activity

upvoted a paper 6 days ago

A Very Big Video Reasoning Suite

liked a model about 1 month ago

moonshotai/Kimi-K2.5

liked a dataset 2 months ago

VITRA-VLA/VITRA-1M

View all activity

Organizations

kenhktsui 's collections 7

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 24
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 33
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 28
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 70 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 32 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 4
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 20 • 1

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 70 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 88.1k • 811
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 1.07k • 316
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 222 • 124

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 38.7k • 519
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14k • 231
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 10.9k • 299
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 442 • 21

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 276 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 5.87k • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 725 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 14 • 2

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 5.87k • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 276 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 7 • 9
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 5 • 4

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 99 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 2 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 4 • 1

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 24
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 33
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 28
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 276 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 5.87k • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 725 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 14 • 2

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 70 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 32 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 4
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 20 • 1

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 5.87k • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 276 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 7 • 9
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 5 • 4

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 70 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 88.1k • 811
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 1.07k • 316
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 222 • 124

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 99 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 2 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 4 • 1

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 38.7k • 519
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14k • 231
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 10.9k • 299
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 442 • 21