Phil's picture

Phil

phil111

·

AI & ML interests

None yet

Organizations

None yet

New activity in ServiceNow-AI/Apriel-1.5-15b-Thinker 5 months ago

Doesn't stop thinking.

#3 opened 5 months ago by

New activity in zai-org/GLM-4.5 5 months ago

Impressive Broad Knowledge

#12 opened 7 months ago by

New activity in nvidia/NVIDIA-Nemotron-Nano-9B-v2 6 months ago

This just trades general performance for domain specific gains.

#3 opened 6 months ago by

New activity in ByteDance-Seed/Seed-OSS-36B-Base 6 months ago

Please stop blindly trusting and reporting Alibaba's scores.

#1 opened 6 months ago by

New activity in google/gemma-3-270m 6 months ago

Weird responses

#10 opened 7 months ago by

New activity in google/gemma-3-270m-it 7 months ago

Gemma A3B

#3 opened 7 months ago by

New activity in openai/gpt-oss-120b 7 months ago

gpt-oss is actually good. even on less common benchmark

#109 opened 7 months ago by

groupfairnessllm

New activity in openai/gpt-oss-20b 7 months ago

model quality issues

#92 opened 7 months ago by

New activity in Qwen/Qwen3-4B-Instruct-2507 7 months ago

Terrible instruction following

#3 opened 7 months ago by

4b model with an 84.2 MMLU-Redux score?

#2 opened 7 months ago by

New activity in openai/gpt-oss-20b 7 months ago

This model is unbelievably ignorant.

#14 opened 7 months ago by

New activity in openai/gpt-oss-120b 7 months ago

Knowledge limitations

#25 opened 7 months ago by

New activity in Qwen/Qwen3-30B-A3B-Instruct-2507 7 months ago

An Improvement, But Q3 30b Still Has Very Little General Knowledge

#2 opened 7 months ago by

Test Scores Can Be Misleading

#8 opened 7 months ago by

New activity in Qwen/Qwen3-235B-A22B-Instruct-2507 7 months ago

More Knowledge, But Hard To Extract

#29 opened 7 months ago by

New activity in baidu/ERNIE-4.5-300B-A47B-PT 7 months ago

The SimpleQA score of the model is WAY off.

#2 opened 8 months ago by

New activity in Qwen/Qwen3-30B-A3B 7 months ago

Qwen3 is great, but could be better.

#18 opened 10 months ago by

New activity in Qwen/Qwen3-235B-A22B-Instruct-2507 7 months ago

SimpleQA jumped from 12.2 to 54.3?

#4 opened 7 months ago by

New activity in LGAI-EXAONE/EXAONE-4.0-32B 8 months ago

SimpleQA score?

#6 opened 8 months ago by

New activity in baidu/ERNIE-4.5-21B-A3B-PT 8 months ago

That SimpleQA score looks too good to be true.

#1 opened 8 months ago by