YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
license: mit
base_model: westlake-repl/SaProt_35M_AF2
Model Description
This model is fine-tuned to predict the mutation effects of Bacillus subtilis α-amylase. It takes amino acid sequences as input and performs protein-level regression to predict quantitative enzyme activity values, enabling accurate assessment of how mutations alter enzyme function.
Task type: protein-level regression
Model input type: Amino acid sequence
Dataset
The dataset is sourced from van der Flier et al. (2024), available at https://www.sciencedirect.com/science/article/pii/S2001037024002940. It contains a total of 3706 rows, with each sample carrying 1 to 8 mutation sites. The full dataset is randomly split into training, validation, and test sets following an 8:1:1 ratio. The target label is absorbance, ranging from -0.001 to 0.211. A higher absorbance value indicates greater starch degradation, corresponding to stronger detergent activity of the amylase enzyme. Performance (on test set)
Spearman correlation: 0.76
LoRA config
r: 8
lora_dropout: 0.1
lora_alpha: 16
target_modules: ["query", "intermediate.dense", "value", "output.dense", "key"]
modules_to_save: ["classifier"]
Training config
optimizer:
class: AdamW
betas: (0.9, 0.98)
weight_decay: 0.01
learning rate: 0.0005
epoch: 30
batch size: 64
precision: 16-mixed