Cached layer activations for steering vector experiments
Abdullah
amirali1985
AI & ML interests
Mechanistic interpretability, high dimensional geometry, persona role playing.
Recent Activity
updated a model about 5 hours ago
thoughtworks/arithmetic-sorl updated a dataset about 5 hours ago
thoughtworks/arithmetic-sorl-data published a dataset about 5 hours ago
thoughtworks/arithmetic-sorl-dataOrganizations
steering_with_curvature_metrics
-
amirali1985/convsersations_corrigible_more_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 8.27k • 11 -
amirali1985/convsersations_power_seeking_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 8.27k • 11 -
amirali1985/convsersations_self_awareness_general_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 10k • 12 -
amirali1985/convsersations_sadness_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 9.78k • 7
activations_steering
Cached layer activations for steering vector experiments
steering_with_curvature_metrics
-
amirali1985/convsersations_corrigible_more_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 8.27k • 11 -
amirali1985/convsersations_power_seeking_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 8.27k • 11 -
amirali1985/convsersations_self_awareness_general_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 10k • 12 -
amirali1985/convsersations_sadness_llama3.2-1B-it_large_with_curvature
Viewer • Updated • 9.78k • 7
models 15
amirali1985/interpreting_reward_models
Updated
amirali1985/gpt-neo-125m_hh_reward
Text Generation • 0.1B • Updated • 14
amirali1985/gpt-neo-125m_utility_reward
Reinforcement Learning • Updated • 1
amirali1985/pythia-70m_sentiment_reward
Reinforcement Learning • Updated • 1
amirali1985/pythia-160m_sentiment_reward
Reinforcement Learning • Updated • 7
amirali1985/gpt-neo-125m_sentiment_reward
Reinforcement Learning • Updated • 1
amirali1985/pythia-160m_utility_reward
Reinforcement Learning • Updated • 4
amirali1985/pythia-70m_utility_reward
Reinforcement Learning • 70.4M • Updated • 5
amirali1985/gpt-j-6b-sharded-bf16_sentiment_reward
Reinforcement Learning • Updated
amirali1985/pythia-410m_utility_reward
Reinforcement Learning • Updated • 2
datasets 25
amirali1985/convsersations_sadness_llama3.1-8B-it_large
Viewer • Updated • 9.78k • 35
amirali1985/convsersations_excitement_llama3.1-8B-it_large
Viewer • Updated • 8k • 38
amirali1985/convsersations_rude_llama3.1-8B-it_large
Viewer • Updated • 15.9k • 42
amirali1985/convsersations_humor_llama3.1-8B-it_large
Viewer • Updated • 9.63k • 47
amirali1985/convsersations_corrigible_more_llama3.1-8B-it_large
Viewer • Updated • 8.27k • 40
amirali1985/convsersations_power_seeking_llama3.1-8B-it_large
Viewer • Updated • 8.27k • 55
amirali1985/convsersations_wealth_seeking_llama3.1-8B-it_large
Viewer • Updated • 11.4k • 41
amirali1985/convsersations_self_awareness_general_llama3.1-8B-it_large
Viewer • Updated • 13.4k • 54
amirali1985/llama3.2-1B-it_power_seeking_layer10
Viewer • Updated • 8.27k • 27
amirali1985/synthetic-shapes-3x6x7
Viewer • Updated • 13.2k • 70 • 1