Update README code snippet

#1
by tomaarsen HF Staff - opened

Hello!

Preface

Congratulations! I'm really looking forward to seeing more evaluations on e.g. (M)MTEB as well. A very impressive jump in performance.

Pull Request overview

  • Use model.encode_query() and model.encode_document() in the README snippet
  • Default to "document" prompt name in config_sentence_transformers.json.

Details

Generally, I recommend using model.encode_query() and model.encode_document() for users if they want to perform retrieval, as these are just encode but with the query/document prompts automatically applied. The 2nd change means that if someone does use model.encode() without any prompt or prompt_name, then it defaults to the document option (i.e. "<|im_start|>system\ndocument<|im_end|>\n<|im_start|>user\n"). This should give much better performance than not using any prompt at all.

You're totally free to update the README snippet/texts to your liking. I prefer adding an "expected similarity" though, so end users who run the models locally with various ways can have confidence that their version gives the expected results.

  • Tom Aarsen
tomaarsen changed pull request status to open
ZeroEntropy org

LGTM! Thank you, Tom! Will let @npip99 merge.

npip99 changed pull request status to merged

Sign up or log in to comment