Phil
phil111
ยท
AI & ML interests
None yet
Organizations
None yet
Doesn't stop thinking.
๐ 1
9
#3 opened 5 months ago
by
phil111
Impressive Broad Knowledge
๐ ๐ 5
8
#12 opened 7 months ago
by
phil111
This just trades general performance for domain specific gains.
๐ฅ ๐ 16
11
#3 opened 6 months ago
by
phil111
Please stop blindly trusting and reporting Alibaba's scores.
๐ 8
2
#1 opened 6 months ago
by
phil111
Weird responses
12
#10 opened 7 months ago
by
vparth7
Gemma A3B
๐ 6
13
#3 opened 7 months ago
by
Maria99934
gpt-oss is actually good. even on less common benchmark
๐ค ๐ 7
2
#109 opened 7 months ago
by
groupfairnessllm
model quality issues
5
#92 opened 7 months ago
by
TheBigBlockPC
Terrible instruction following
๐ 1
4
#3 opened 7 months ago
by
denisalpino
4b model with an 84.2 MMLU-Redux score?
๐ค 3
1
#2 opened 7 months ago
by
phil111
This model is unbelievably ignorant.
โ ๐ 42
15
#14 opened 7 months ago
by
phil111
Knowledge limitations
๐ 2
5
#25 opened 7 months ago
by
hexess
An Improvement, But Q3 30b Still Has Very Little General Knowledge
โค๏ธ ๐ 3
11
#2 opened 7 months ago
by
phil111
Test Scores Can Be Misleading
๐ 1
8
#8 opened 7 months ago
by
phil111
More Knowledge, But Hard To Extract
๐ 1
#29 opened 7 months ago
by
phil111
The SimpleQA score of the model is WAY off.
๐ฅ 4
3
#2 opened 8 months ago
by
phil111
Qwen3 is great, but could be better.
๐ 9
25
#18 opened 10 months ago
by
phil111
SimpleQA jumped from 12.2 to 54.3?
๐ฅ ๐ง 23
25
#4 opened 7 months ago
by
phil111
SimpleQA score?
#6 opened 8 months ago
by
phil111
That SimpleQA score looks too good to be true.
๐ 12
19
#1 opened 8 months ago
by
phil111