COMBO UD 2.17 Models
Collection
88 items • Updated
This is a Hindi-language model based on COMBO-NLP, an open-source natural language preprocessing system. It performs:
The Hindi model uses FacebookAI/xlm-roberta-base as its base encoder and is trained on UD_Hindi-HDTB (UD v2.17).
Evaluation was performed on the UD_Hindi-HDTB test split using the standard CoNLL 2018 eval script.
Two evaluation rows are reported:
| Metric | Tokens | Sentences | Words | UPOS | XPOS | UFeats | AllTags | Lemmas |
|---|---|---|---|---|---|---|---|---|
| Full-text (F1) | 99.83 | 99.20 | 99.83 | 97.99 | 97.67 | 94.05 | 92.50 | 98.81 |
| Aligned accuracy | 0.00 | 0.00 | 0.00 | 98.16 | 97.83 | 94.21 | 92.66 | 98.97 |
| Metric | UAS | LAS | CLAS | MLAS | BLEX |
|---|---|---|---|---|---|
| Full-text (F1) | 95.30 | 92.31 | 89.24 | 79.38 | 87.99 |
| Aligned accuracy | 95.46 | 92.47 | 89.41 | 79.54 | 88.17 |
Install the library from PyPI (assuming you have a virtual environment created):
pip install combo-nlp
Install the Lambo segmenter - only needed when passing raw text strings to COMBO:
pip install --index-url https://pypi.clarin-pl.eu/ lambo
from combo import COMBO
# Load a pre-trained model with corresponding Lambo segmenter
nlp = COMBO("Hindi")
# Parse raw text (handles sentence splitting + tokenization)
result = nlp("तेष़ भूरी लोमड़ी आलसी कुत्ते के ऊपर कूदती है।")
# Inspect results
for sentence in result:
for token in sentence:
print(f"{token.form:<15} {token.lemma:<15} {token.upos:<8} head={token.head} {token.deprel}")
Refer to the COMBO-NLP documentation for installation and usage instructions: