Model Card for Tower-7b-MT-SFT

Model Sources

  • Paper: TODO

  • Link: TODO

  • Repository: TODO

Model Details

Model Description

Tower-7b-MT-SFT is a language model that results from fine-tuning TowerBase 7b on the machine translation data of TowerBlocks dataset.

  • Model type: A 7B parameter translation model built on top of TowerBase.
  • Language(s) (NLP): English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Russian, Chinese
  • License: CC-BY-NC-4.0, The LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

Intended uses & limitations

The model was initially fine-tuned on an English-centric parallel data from TowerBlocks.

The model can perform translations between supported languages, with limited translation capabilities between non-English languages.

The model is also used to synthesize preference data for subsequent x2x optimization.

Here's how you can run the model with Huggingface Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_PATH = "double7/Tower-7b-MT-SFT"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH, device_map="auto", torch_dtype="auto"
)

src_lang = "German"
trg_lang = "Chinese"
src_text = "Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein."

prompt = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"

# We use the tokenizer’s chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {"role": "user", "content": prompt},
]

input_text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, do_sample=False, max_new_tokens=256)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
print(output_text)
# <s><|im_start|> user
# Translate the following text from German into Chinese:
# German: Filmkarriere Collinges Filmdebüt in Die kleinen Füchse von 1941 brachte ihr eine Nominierung für den Academy Award als beste Nebendarstellerin ein.
# Chinese:<|im_end|> 
# <|im_start|> assistant
# 电影生涯 科林格的电影处子作《小狐狸》于 1941 年上映,她因此获得了奥斯卡最佳女配角提名。<|im_end|>

Translation Instructions

Following TowerInstruct, we use diverse translation instructions in training, you can use natural language to describe translation requests, such as:

prompt1 = f"Translate the following text from {src_lang} into {trg_lang}:\n{src_lang}: {src_text}\n{trg_lang}:"

prompt1 = f"Please provide a translation from {src_lang} to {trg_lang} for the following text:\n{src_text}\nTarget:",

prompt2 = f"Translate this {src_lang} text into {trg_lang}:\nSource: {src_text}\nTranslation:",

We use prompt1 for the evaluation.

Out-of-Scope Use

The model is not guaranteed to perform for languages other than the 10 languages it supports.

Bias, Risks, and Limitations

Tower-7b-MT-SFT has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).

Prompt Format

Tower-7b-MT-SFT was trained using the ChatML prompt templates without any system prompts. An example follows below:

<|im_start|>user
{USER PROMPT}<|im_end|>
<|im_start|>assistant
{MODEL RESPONSE}<|im_end|>
<|im_start|>user
[...]

Training Details

Training Data

We use the machine translation task data (about 150k) from TowerBlocks.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-06
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1
  • max_seq_length: 2048

Citation

TODO

Downloads last month
4
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for double7/Tower-7b-MT-SFT

Finetuned
(2)
this model

Dataset used to train double7/Tower-7b-MT-SFT

Collection including double7/Tower-7b-MT-SFT

Paper for double7/Tower-7b-MT-SFT