Commit History

Add HuggingFace Hub checkpoint persistence - upload and download checkpoints between jobs

3b46388

renpas22 commited on Dec 18, 2025

Restore SFT to 10000 steps - will resume from checkpoint at 7000

e9e0301

renpas22 commited on Dec 18, 2025

Add checkpoint resumption - automatically resume from latest checkpoint

5419afd

renpas22 commited on Dec 18, 2025

Reduce training steps to fit within job timeout (SFT: 5k, PRM: 2k, RL: 3k)

5cfd8d6

renpas22 commited on Dec 18, 2025

Fix ReasoningStep attribute - use description not content

f941008

renpas22 commited on Dec 18, 2025

Fix SPECIAL_TOKENS usage - import at module level and use string literals

464ac9b

renpas22 commited on Dec 18, 2025

Fix ReasoningChain dataclass - add image field and defaults, fix collate function

3024a91

renpas22 commited on Dec 18, 2025

Fix None image handling in collate function

d29f3e7

renpas22 commited on Dec 18, 2025

Convert learning_rate to float explicitly

9e7779a

renpas22 commited on Dec 18, 2025

Remove dead code with direct config access

ccd696b

renpas22 commited on Dec 18, 2025

Add getattr defaults for all config parameters

cd76323

renpas22 commited on Dec 18, 2025

Fix FineVision dataset loading with subset parameter

e47ae2c

renpas22 commited on Dec 18, 2025

Implement full SFT, PRM, and RL training with dataset loading

84a183c

renpas22 commited on Dec 18, 2025

Make train_prm and train_rl placeholders - dataset loading needs HF integration

7bff7cb

renpas22 commited on Dec 18, 2025

Force fresh repository download with cache clearing

714d05d

renpas22 commited on Dec 18, 2025

Add **kwargs to train_prm and train_rl to accept config parameters

917e40e

renpas22 commited on Dec 18, 2025

Force download latest revision from HF repo

f8fc68a

renpas22 commited on Dec 18, 2025

Fix train_prm and train_rl signatures to accept max_steps and learning_rate

41bcc92

renpas22 commited on Dec 18, 2025

Add placeholder train_sft method

85ab8c2

renpas22 commited on Dec 18, 2025

Fix quantized model device handling in inference_scaling

a745b26

renpas22 commited on Dec 18, 2025

Add type conversion for RLConfig parameters

f15c2d7

renpas22 commited on Dec 18, 2025

Add type conversion and debug logging for config values

c74a578

renpas22 commited on Dec 18, 2025

Quote mixed_precision value

37e8f2f

renpas22 commited on Dec 17, 2025

Add inference config parameters

2d1ba1a

renpas22 commited on Dec 17, 2025

Skip .to(device) for quantized models with device_map

fa9e543

renpas22 commited on Dec 17, 2025

Add missing RL/PPO config parameters

5af9eca

renpas22 commited on Dec 17, 2025

Fix gradient checkpointing for VLM models

0326431

renpas22 commited on Dec 17, 2025

Fix tokenize() calls to use actual tokenizer

8268436

renpas22 commited on Dec 17, 2025

Fix working directory for HF Jobs environment

bb7ed44

renpas22 commited on Dec 17, 2025

Fix len() calls to use actual_tokenizer instead of processor

4605c1b

renpas22 commited on Dec 12, 2025

Fix tokenizer access for Processor objects

b8bd3e8

renpas22 commited on Dec 12, 2025

Add VLM support to trainer with auto-detection

e20135f

renpas22 commited on Dec 12, 2025

Fix model name to use valid Qwen2-VL model

61dbc34

renpas22 commited on Dec 12, 2025

Enable 8-bit quantization and reduce batch size for memory

d4b8544

renpas22 commited on Dec 12, 2025

Add required top-level config keys for trainer

9770aa1

renpas22 commited on Dec 12, 2025

Fix config path resolution for trainer initialization

ec4cb07

renpas22 commited on Dec 12, 2025

Fix function call arguments and trainer initialization

879210b

renpas22 commited on Dec 12, 2025

Fix config path resolution for HF Jobs

82de57b

renpas22 commited on Dec 12, 2025

Add utils directory

da76488

renpas22 commited on Dec 12, 2025

Auto-download repository in HF Jobs environment

487225b

renpas22 commited on Dec 12, 2025

Fix imports to work with sys.path

83e0535

renpas22 commited on Dec 12, 2025

Fix Python path for imports

27b06f0

renpas22 commited on Dec 12, 2025

Add training scripts and configs

2b8876a

renpas22 commited on Dec 12, 2025

initial commit

cd336ff
verified

Mulebot commited on Dec 12, 2025

Commit History

Add HuggingFace Hub checkpoint persistence - upload and download checkpoints between jobs 3b46388

Restore SFT to 10000 steps - will resume from checkpoint at 7000 e9e0301

Add checkpoint resumption - automatically resume from latest checkpoint 5419afd

Reduce training steps to fit within job timeout (SFT: 5k, PRM: 2k, RL: 3k) 5cfd8d6

Fix ReasoningStep attribute - use description not content f941008

Fix SPECIAL_TOKENS usage - import at module level and use string literals 464ac9b

Fix ReasoningChain dataclass - add image field and defaults, fix collate function 3024a91

Fix None image handling in collate function d29f3e7

Convert learning_rate to float explicitly 9e7779a

Remove dead code with direct config access ccd696b

Add getattr defaults for all config parameters cd76323

Fix FineVision dataset loading with subset parameter e47ae2c

Implement full SFT, PRM, and RL training with dataset loading 84a183c

Make train_prm and train_rl placeholders - dataset loading needs HF integration 7bff7cb

Force fresh repository download with cache clearing 714d05d

Add **kwargs to train_prm and train_rl to accept config parameters 917e40e

Force download latest revision from HF repo f8fc68a

Fix train_prm and train_rl signatures to accept max_steps and learning_rate 41bcc92

Add placeholder train_sft method 85ab8c2

Fix quantized model device handling in inference_scaling a745b26

Add type conversion for RLConfig parameters f15c2d7

Add type conversion and debug logging for config values c74a578

Quote mixed_precision value 37e8f2f

Add inference config parameters 2d1ba1a

Skip .to(device) for quantized models with device_map fa9e543

Add missing RL/PPO config parameters 5af9eca

Fix gradient checkpointing for VLM models 0326431

Fix tokenize() calls to use actual tokenizer 8268436

Fix working directory for HF Jobs environment bb7ed44

Fix len() calls to use actual_tokenizer instead of processor 4605c1b

Fix tokenizer access for Processor objects b8bd3e8

Add VLM support to trainer with auto-detection e20135f

Fix model name to use valid Qwen2-VL model 61dbc34

Enable 8-bit quantization and reduce batch size for memory d4b8544

Add required top-level config keys for trainer 9770aa1

Fix config path resolution for trainer initialization ec4cb07

Fix function call arguments and trainer initialization 879210b

Fix config path resolution for HF Jobs 82de57b

Add utils directory da76488

Auto-download repository in HF Jobs environment 487225b

Fix imports to work with sys.path 83e0535

Fix Python path for imports 27b06f0

Add training scripts and configs 2b8876a

initial commit cd336ff verified

Add HuggingFace Hub checkpoint persistence - upload and download checkpoints between jobs

3b46388

Restore SFT to 10000 steps - will resume from checkpoint at 7000

e9e0301

Add checkpoint resumption - automatically resume from latest checkpoint

5419afd

Reduce training steps to fit within job timeout (SFT: 5k, PRM: 2k, RL: 3k)

5cfd8d6

Fix ReasoningStep attribute - use description not content

f941008

Fix SPECIAL_TOKENS usage - import at module level and use string literals

464ac9b

Fix ReasoningChain dataclass - add image field and defaults, fix collate function

3024a91

Fix None image handling in collate function

d29f3e7

Convert learning_rate to float explicitly

9e7779a

Remove dead code with direct config access

ccd696b

Add getattr defaults for all config parameters

cd76323

Fix FineVision dataset loading with subset parameter

e47ae2c

Implement full SFT, PRM, and RL training with dataset loading

84a183c

Make train_prm and train_rl placeholders - dataset loading needs HF integration

7bff7cb

Force fresh repository download with cache clearing

714d05d

Add **kwargs to train_prm and train_rl to accept config parameters

917e40e

Force download latest revision from HF repo

f8fc68a

Fix train_prm and train_rl signatures to accept max_steps and learning_rate

41bcc92

Add placeholder train_sft method

85ab8c2

Fix quantized model device handling in inference_scaling

a745b26

Add type conversion for RLConfig parameters

f15c2d7

Add type conversion and debug logging for config values

c74a578

Quote mixed_precision value

37e8f2f

Add inference config parameters

2d1ba1a

Skip .to(device) for quantized models with device_map

fa9e543

Add missing RL/PPO config parameters

5af9eca

Fix gradient checkpointing for VLM models

0326431

Fix tokenize() calls to use actual tokenizer

8268436

Fix working directory for HF Jobs environment

bb7ed44

Fix len() calls to use actual_tokenizer instead of processor

4605c1b

Fix tokenizer access for Processor objects

b8bd3e8

Add VLM support to trainer with auto-detection

e20135f

Fix model name to use valid Qwen2-VL model

61dbc34

Enable 8-bit quantization and reduce batch size for memory

d4b8544

Add required top-level config keys for trainer

9770aa1

Fix config path resolution for trainer initialization

ec4cb07

Fix function call arguments and trainer initialization

879210b

Fix config path resolution for HF Jobs

82de57b

Add utils directory

da76488

Auto-download repository in HF Jobs environment

487225b

Fix imports to work with sys.path

83e0535

Fix Python path for imports

27b06f0

Add training scripts and configs

2b8876a

initial commit

cd336ff
verified