MoshiVis v0.1
Collection
MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs
•
9 items
•
Updated
•
23
Please refer to the main model card
This model page contains the Moshika (female voice) model weights for the rust backend of the MoshiVis repo,
in Q8 format. We provide the same model weights for other backends and quantization formats in the associated model collection.
8-bit
Base model
google/paligemma2-3b-pt-448