·
AI & ML interests
None yet
Organizations
jplhughes2/classify_alignment_faking_human_labels
Viewer
• Updated
• 106 • 18
• 1
jplhughes2/docs_only_val_5k_filtered
Viewer
• Updated
• 5k • 4
jplhughes2/docs_only_30k_filtered
Viewer
• Updated
• 30k • 6
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-0k-docs-8k-benign-2k-refusals
Viewer
• Updated
• 10k • 4
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-0k-docs-20k-benign-10k-refusals
Viewer
• Updated
• 29.4k • 5
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-5k-docs-8k-benign-2k-refusals
Viewer
• Updated
• 15k • 4
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-5k-docs-4k-benign-1k-refusals
Viewer
• Updated
• 10k • 4
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-10k-docs-8k-benign-2k-refusals
Viewer
• Updated
• 20k • 2
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-30k-docs-0k-benign-0k-refusals
Viewer
• Updated
• 30k • 6
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-20k-docs-0k-benign-0k-refusals
Viewer
• Updated
• 20k • 6
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-10k-docs-0k-benign-0k-refusals
Viewer
• Updated
• 10k • 4
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-5k-docs-0k-benign-0k-refusals
Viewer
• Updated
• 5k • 4
• 1
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-90k
Viewer
• Updated
• 90k • 2
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-90k-benign-50k-refusals
Viewer
• Updated
• 149k • 4
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-30k
Viewer
• Updated
• 30k • 1
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-30k-benign-20k
Viewer
• Updated
• 50k • 2
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-30k-benign-20k-refusals
Viewer
• Updated
• 59.4k • 2
jplhughes2/alignment-faking-synthetic-chat-dataset-recall-90k-benign-20k-refusals
Viewer
• Updated
• 119k • 3