Tokenizer Choice For LLM Training: Negligible or Crucial? Paper • 2310.08754 • Published Oct 12, 2023 • 3
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings Paper • 2202.06671 • Published Feb 14, 2022 • 2
Specialized Document Embeddings for Aspect-based Similarity of Research Papers Paper • 2203.14541 • Published Mar 28, 2022
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning Paper • 2301.09626 • Published Jan 23, 2023 • 2
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19, 2025 • 47
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling Paper • 2310.11628 • Published Oct 17, 2023
Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations Paper • 2402.03053 • Published Feb 5, 2024 • 2
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding Paper • 2401.13565 • Published Jan 24, 2024 • 4
Multitask Learning and Multistage Fusion for Dimensional Audiovisual Emotion Recognition Paper • 2002.11312 • Published Feb 26, 2020
SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification Paper • 2301.11309 • Published Jan 26, 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs Paper • 2305.11860 • Published May 19, 2023
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts Paper • 2202.01279 • Published Feb 2, 2022