AI & ML interests
DeepRL, RL finetuning
Organizations
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 27k • 11
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offline-sandbox
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.1k • 21
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.1k • 16
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.1k • 26
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 91.9k • 13
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 91.9k • 110
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 91.9k • 21
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.5k • 62
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.5k • 30
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.5k • 19
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 99k • 75
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 99k • 63
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 99.1k • 13
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62k • 25
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62k • 21
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62k • 30
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 100k • 55
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 100k • 9
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 100k • 19
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 61.6k • 23
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 61.6k • 82
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 61.6k • 27
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 93.8k • 10
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 93.8k • 93
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 96.6k • 19
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 26.6k • 9
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offline-sandbox