[
{
"additions": 8,
"author": "harshaljanjani",
"author_association": "CONTRIBUTOR",
"body_excerpt": "### What does this PR do? The following failing tests were identified and fixed in this PR (grouped them together since they share related root causes OR the code changes were extremely minimal and didn't warrant separate PRs): \u2192 **Pi0**:\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/45004",
"created_at": "2026-03-25T19:58:57Z",
"deletions": 4,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/45004/files",
"html_url": "https://github.com/huggingface/transformers/pull/45004",
"labels": [],
"merged": false,
"number": 45004,
"review_comments_count": 0,
"state": "open",
"title": "fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI",
"updated_at": "2026-03-25T19:58:57Z"
},
{
"additions": 1,
"author": "hmellor",
"author_association": "MEMBER",
"body_excerpt": "`None` is a valid value that can be used to disable chunked attention in `DynamicCache` and Flex Attention. hf.co/morgendave/EAGLE-Llama-4-Scout-17B-16E-Instruct is an example of a checkpoint which does this.",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/45002",
"created_at": "2026-03-25T17:40:14Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/45002/files",
"html_url": "https://github.com/huggingface/transformers/pull/45002",
"labels": [],
"merged": false,
"number": 45002,
"review_comments_count": 0,
"state": "open",
"title": "Fix type hint for `attention_chunk_size` in `Llama4TextConfig`",
"updated_at": "2026-03-25T17:52:53Z"
},
{
"additions": 7,
"author": "Sai-Suraj-27",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? For [torch>=2.10.0](https://docs.pytorch.org/docs/2.10/generated/torch.nn.functional.grouped_mm.html#torch-nn-functional-grouped-mm), the minimum CUDA compute capability requirement for `torch.nn.functional.grouped_\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/45001",
"created_at": "2026-03-25T17:00:28Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/45001/files",
"html_url": "https://github.com/huggingface/transformers/pull/45001",
"labels": [],
"merged": false,
"number": 45001,
"review_comments_count": 0,
"state": "open",
"title": "Add cuda compatibility check for using `grouped_mm`",
"updated_at": "2026-03-25T19:31:09Z"
},
{
"additions": 21,
"author": "zucchini-nlp",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? As per title, after https://github.com/huggingface/transformers/pull/44976 users will be seeing a `missing_weights - lm_head not found` error even though the model doesn't use an lm head On the way also deleted unne\u2026",
"changed_files": 8,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 5,
"conversation_url": "https://github.com/huggingface/transformers/pull/45000",
"created_at": "2026-03-25T16:28:55Z",
"deletions": 109,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/45000/files",
"html_url": "https://github.com/huggingface/transformers/pull/45000",
"labels": [],
"merged": false,
"number": 45000,
"review_comments_count": 0,
"state": "open",
"title": "Embedding VLMs don't need a head",
"updated_at": "2026-03-25T18:53:51Z"
},
{
"additions": 1002,
"author": "itazap",
"author_association": "MEMBER",
"body_excerpt": "## Summary - Auto-generated modular integration for `sarvam` - `modular_sarvam.py` written by Claude Opus 4.6 guided by `modular_model_detector.py` - `modeling_sarvam.py` regenerated from modular via `modular_model_converter.py` ## Test pl\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44999",
"created_at": "2026-03-25T16:21:37Z",
"deletions": 0,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44999/files",
"html_url": "https://github.com/huggingface/transformers/pull/44999",
"labels": [],
"merged": false,
"number": 44999,
"review_comments_count": 0,
"state": "open",
"title": "Add sarvam model",
"updated_at": "2026-03-25T16:31:36Z"
},
{
"additions": 1179,
"author": "itazap",
"author_association": "MEMBER",
"body_excerpt": "## Summary - Auto-generated modular integration for `sarvam` - `modular_sarvam.py` written by Claude Opus 4.6 guided by `modular_model_detector.py` - `modeling_sarvam.py` regenerated from modular via `modular_model_converter.py` ## Test pl\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44997",
"created_at": "2026-03-25T14:23:13Z",
"deletions": 0,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44997/files",
"html_url": "https://github.com/huggingface/transformers/pull/44997",
"labels": [],
"merged": false,
"number": 44997,
"review_comments_count": 0,
"state": "closed",
"title": "Add sarvam model",
"updated_at": "2026-03-25T14:35:45Z"
},
{
"additions": 255,
"author": "3outeille",
"author_association": "MEMBER",
"body_excerpt": "- TODO: - fix failing tests due to API change - make sure our `fsdp2` is not triggered if `accelerate` is on - Introduce `DistributedConfig` - `DistributedConfig(tp_size=2, fsdp_size=2) # plans default to \"auto\"` replaces passing separate\u2026",
"changed_files": 7,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44996",
"created_at": "2026-03-25T14:20:25Z",
"deletions": 216,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44996/files",
"html_url": "https://github.com/huggingface/transformers/pull/44996",
"labels": [],
"merged": false,
"number": 44996,
"review_comments_count": 0,
"state": "open",
"title": " from_pretrained distributed refactor (FSDP2 + TP)",
"updated_at": "2026-03-25T17:22:04Z"
},
{
"additions": 3639,
"author": "itazap",
"author_association": "MEMBER",
"body_excerpt": null,
"changed_files": 8,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44994",
"created_at": "2026-03-25T14:02:50Z",
"deletions": 242,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44994/files",
"html_url": "https://github.com/huggingface/transformers/pull/44994",
"labels": [],
"merged": false,
"number": 44994,
"review_comments_count": 0,
"state": "closed",
"title": "Add sarvam model",
"updated_at": "2026-03-25T14:04:38Z"
},
{
"additions": 426,
"author": "tarekziade",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? `make check-repo` can be quite slow, one of the biggest bottleneck is the docstring checker that look at all functions/methods. Checkers in general also don't have any cache to prevent re-running on files that have\u2026",
"changed_files": 5,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44992",
"created_at": "2026-03-25T11:40:46Z",
"deletions": 62,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44992/files",
"html_url": "https://github.com/huggingface/transformers/pull/44992",
"labels": [],
"merged": false,
"number": 44992,
"review_comments_count": 0,
"state": "open",
"title": "refactoring: speedup static checks",
"updated_at": "2026-03-25T16:52:04Z"
},
{
"additions": 8,
"author": "ArthurZucker",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? - BC for check model inputs",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44990",
"created_at": "2026-03-25T10:26:20Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44990/files",
"html_url": "https://github.com/huggingface/transformers/pull/44990",
"labels": [],
"merged": true,
"number": 44990,
"review_comments_count": 0,
"state": "closed",
"title": "More small vllm fixes",
"updated_at": "2026-03-25T13:05:44Z"
},
{
"additions": 1,
"author": "3outeille",
"author_association": "MEMBER",
"body_excerpt": "- Steps breakdown: - FSDP + TP: - https://github.com/huggingface/transformers/pull/44083 - [Request](https://github.com/huggingface/transformers/pull/44083#pullrequestreview-3975401342) to use our loading method https://github.com/huggingf\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44989",
"created_at": "2026-03-25T09:10:02Z",
"deletions": 0,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44989/files",
"html_url": "https://github.com/huggingface/transformers/pull/44989",
"labels": [],
"merged": false,
"number": 44989,
"review_comments_count": 0,
"state": "open",
"title": "\ud83d\udea8 Distributed training API",
"updated_at": "2026-03-25T16:16:45Z"
},
{
"additions": 584,
"author": "tarekziade",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Adds Rule 14 ``` if _tied_weights_keys is present and non-empty in modeling -> Config MUST contain the tie_word_embeddings field ```",
"changed_files": 9,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 6,
"conversation_url": "https://github.com/huggingface/transformers/pull/44988",
"created_at": "2026-03-25T07:08:20Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44988/files",
"html_url": "https://github.com/huggingface/transformers/pull/44988",
"labels": [],
"merged": false,
"number": 44988,
"review_comments_count": 0,
"state": "open",
"title": "typing: add rule 14 - checks for tie_word_embeddings presence",
"updated_at": "2026-03-25T13:00:07Z"
},
{
"additions": 0,
"author": "Krishnachaitanyakc",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## Summary Fixes #44855 On Python 3.13, placing a `# Copied from` comment between `@torch.jit.script` and the function definition causes an `IndentationError`. This happens because `torch.jit.script` calls `inspect.getsource()` followed by\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44986",
"created_at": "2026-03-25T03:18:31Z",
"deletions": 6,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44986/files",
"html_url": "https://github.com/huggingface/transformers/pull/44986",
"labels": [],
"merged": true,
"number": 44986,
"review_comments_count": 0,
"state": "closed",
"title": "fix: remove Copied from comments between @torch.jit.script and def for Python 3.13 compat",
"updated_at": "2026-03-25T13:39:54Z"
},
{
"additions": 2,
"author": "Krishnachaitanyakc",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## Summary Fixes #44913 When creating a `GPTNeoXConfig` (or `GPTNeoXJapaneseConfig`) with a non-default `rotary_pct`, the value is lost after a `save_pretrained` / `from_pretrained` round-trip. This happens because `convert_rope_params_to_\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44985",
"created_at": "2026-03-25T02:15:04Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44985/files",
"html_url": "https://github.com/huggingface/transformers/pull/44985",
"labels": [],
"merged": false,
"number": 44985,
"review_comments_count": 1,
"state": "open",
"title": "fix: preserve rotary_pct across save/load cycle in GPTNeoX configs",
"updated_at": "2026-03-25T13:46:44Z"
},
{
"additions": 2,
"author": "Butanium",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## What does this PR do? `maybe_autocast` calls `torch.is_autocast_enabled(device_type)` which raises a `RuntimeError` when `device_type` is `\"meta\"`: ``` RuntimeError: unknown device type for autocast in get_autocast_dispatch_key_from_dev\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44984",
"created_at": "2026-03-25T01:39:23Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44984/files",
"html_url": "https://github.com/huggingface/transformers/pull/44984",
"labels": [],
"merged": true,
"number": 44984,
"review_comments_count": 0,
"state": "closed",
"title": "Fix `maybe_autocast` crashing on meta device tensors",
"updated_at": "2026-03-25T17:45:03Z"
},
{
"additions": 134,
"author": "Hyungkeun-Park-Nota",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## What does this PR do? Implements `Mxfp4Dequantize.reverse_op` so that `save_pretrained()` works for GPT-OSS models loaded with `Mxfp4Config(dequantize=True)`. Currently, `Mxfp4Deserialize` has a `reverse_op` (`Mxfp4ReverseDeserialize`),\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44983",
"created_at": "2026-03-25T01:19:59Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44983/files",
"html_url": "https://github.com/huggingface/transformers/pull/44983",
"labels": [],
"merged": false,
"number": 44983,
"review_comments_count": 0,
"state": "open",
"title": "fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support",
"updated_at": "2026-03-25T14:09:22Z"
},
{
"additions": 108,
"author": "AkshajKashyap",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Fixes #43039 ## What does this PR do? When `prediction_loss_only=True` during evaluation and `use_liger_kernel=True`, `Trainer.prediction_step` now passes `skip_logits=True` to the model forward if the forward signature supports it and lab\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44981",
"created_at": "2026-03-25T00:38:02Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44981/files",
"html_url": "https://github.com/huggingface/transformers/pull/44981",
"labels": [],
"merged": false,
"number": 44981,
"review_comments_count": 0,
"state": "open",
"title": "Trainer: set skip_logits for loss-only eval when liger enabled",
"updated_at": "2026-03-25T02:39:15Z"
},
{
"additions": 6,
"author": "kallewoof",
"author_association": "CONTRIBUTOR",
"body_excerpt": "Pre-patch unnecessarily breaks merging a LoRA adapter with a model using CUDA_VISIBLE_DEVICES= e.g. when VRAM is insufficient. It also breaks non-cuda machine operations (such as merging). # What does this PR do? This PR un-breaks `CUDA_VI\u2026",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44980",
"created_at": "2026-03-24T23:50:07Z",
"deletions": 6,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44980/files",
"html_url": "https://github.com/huggingface/transformers/pull/44980",
"labels": [],
"merged": false,
"number": 44980,
"review_comments_count": 0,
"state": "open",
"title": "bug-fix: do not assume torch.cuda is available when setting up norm values, even if flash linear attention is available",
"updated_at": "2026-03-25T12:33:44Z"
},
{
"additions": 218,
"author": "michaelbenayoun",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Introduces `src/transformers/module_fusion.py`, a utility for fusing adjacent submodules in a model into a single FusedModule that executes them as a chain in one forward pass. The key components are: - `RegistryCol\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44979",
"created_at": "2026-03-24T22:33:31Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44979/files",
"html_url": "https://github.com/huggingface/transformers/pull/44979",
"labels": [],
"merged": false,
"number": 44979,
"review_comments_count": 0,
"state": "open",
"title": "Module Fusion API",
"updated_at": "2026-03-25T19:55:52Z"
},
{
"additions": 4,
"author": "cjkindel",
"author_association": "NONE",
"body_excerpt": "# What does this PR do? `_can_set_attn_implementation` and `_can_set_experts_implementation` both do a direct subscript lookup into `sys.modules`: ```python class_module = sys.modules[cls.__module__] ``` If the module is not registered und\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44978",
"created_at": "2026-03-24T21:01:11Z",
"deletions": 4,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44978/files",
"html_url": "https://github.com/huggingface/transformers/pull/44978",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44978,
"review_comments_count": 0,
"state": "closed",
"title": "fix: handle absent sys.modules entry in modeling_utils",
"updated_at": "2026-03-25T18:28:53Z"
},
{
"additions": 2,
"author": "hmellor",
"author_association": "MEMBER",
"body_excerpt": "- Adds a type hint to `ModernVBertForMaskedLM.__init__` - Removes `tie_word_embeddings` from `Qwen2VLTextConfig` (and therefore also `Qwen2_5_VLTextConfig`) because it's not valid for these models - Remove hack from `ColQwen2Config` (and t\u2026",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44976",
"created_at": "2026-03-24T19:26:33Z",
"deletions": 10,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44976/files",
"html_url": "https://github.com/huggingface/transformers/pull/44976",
"labels": [],
"merged": true,
"number": 44976,
"review_comments_count": 3,
"state": "closed",
"title": "Fix tie_word_embedding issues with `Qwen2VL`",
"updated_at": "2026-03-24T20:55:15Z"
},
{
"additions": 6971,
"author": "philippguevorguian",
"author_association": "NONE",
"body_excerpt": null,
"changed_files": 20,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44975",
"created_at": "2026-03-24T17:12:31Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44975/files",
"html_url": "https://github.com/huggingface/transformers/pull/44975",
"labels": [],
"merged": false,
"number": 44975,
"review_comments_count": 0,
"state": "closed",
"title": "fix: rebase main; clean config reads, ImageProcessor backend, misc cleanup",
"updated_at": "2026-03-24T17:13:42Z"
},
{
"additions": 799,
"author": "3outeille",
"author_association": "MEMBER",
"body_excerpt": null,
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44974",
"created_at": "2026-03-24T16:13:25Z",
"deletions": 82,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44974/files",
"html_url": "https://github.com/huggingface/transformers/pull/44974",
"labels": [],
"merged": false,
"number": 44974,
"review_comments_count": 0,
"state": "open",
"title": "Refactor core_model_loading to support FSDP shard-on-read loading",
"updated_at": "2026-03-25T17:00:57Z"
},
{
"additions": 22,
"author": "andylizf",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## What does this PR do? Adds `.item()` to `max_seqlen = (cu_seqlens[1:] - cu_seqlens[:-1]).max()` in all vision attention modules that pass this value to `flash_attn_varlen_func`. ### Context On **released versions** (e.g. 4.52.4), using\u2026",
"changed_files": 19,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44973",
"created_at": "2026-03-24T15:42:32Z",
"deletions": 22,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44973/files",
"html_url": "https://github.com/huggingface/transformers/pull/44973",
"labels": [],
"merged": false,
"number": 44973,
"review_comments_count": 0,
"state": "open",
"title": "Fix max_seqlen type in vision attention for torch.compile + FA2",
"updated_at": "2026-03-25T14:12:50Z"
},
{
"additions": 17,
"author": "Abdennacer-Badaoui",
"author_association": "MEMBER",
"body_excerpt": "As per title. Updating Gemma3/Gemma3n expectations.",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44972",
"created_at": "2026-03-24T15:11:50Z",
"deletions": 12,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44972/files",
"html_url": "https://github.com/huggingface/transformers/pull/44972",
"labels": [],
"merged": true,
"number": 44972,
"review_comments_count": 10,
"state": "closed",
"title": "[AMD CI] Gemma3/Gemma3n Expectations",
"updated_at": "2026-03-24T16:30:03Z"
},
{
"additions": 0,
"author": "ArthurZucker",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Removed the tokenizer_class attr was never there to begin with, and kwargs are now supported. This was failing some test on vllm ci. Fixes https://buildkite.com/vllm/ci/builds/57601/steps/canvas?sid=019d1aec-aa5a-41\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44971",
"created_at": "2026-03-24T14:59:36Z",
"deletions": 11,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44971/files",
"html_url": "https://github.com/huggingface/transformers/pull/44971",
"labels": [],
"merged": true,
"number": 44971,
"review_comments_count": 1,
"state": "closed",
"title": "[ `vllm x v5`] nit",
"updated_at": "2026-03-24T17:40:05Z"
},
{
"additions": 20,
"author": "IlyasMoutawwakil",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? save locally --> local locally) ```\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44730",
"created_at": "2026-03-15T20:44:32Z",
"deletions": 4,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44730/files",
"html_url": "https://github.com/huggingface/transformers/pull/44730",
"labels": [],
"merged": true,
"number": 44730,
"review_comments_count": 6,
"state": "closed",
"title": "Fix `mlcd` auto config/model/mapping issues",
"updated_at": "2026-03-16T12:12:30Z"
},
{
"additions": 214,
"author": "xenova",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? This PR introduces a helper utility function, `int_div_ceil`, which performs `math.ceil(a / b)` for non-negative integer operands. This is necessary as the current approach is both error-prone and imprecise (especia\u2026",
"changed_files": 58,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44729",
"created_at": "2026-03-15T20:29:38Z",
"deletions": 225,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44729/files",
"html_url": "https://github.com/huggingface/transformers/pull/44729",
"labels": [],
"merged": false,
"number": 44729,
"review_comments_count": 0,
"state": "open",
"title": "Avoid floating point math for ceil operations",
"updated_at": "2026-03-15T20:49:34Z"
},
{
"additions": 88,
"author": "ajmeese7",
"author_association": "NONE",
"body_excerpt": "# What does this PR do? Fixes a GPU memory leak in `Bnb4bitQuantize.convert()` where float16 source tensors are never freed during 4-bit quantized model loading via `from_pretrained`, causing OOM on models whose float16 size exceeds GPU VR\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44728",
"created_at": "2026-03-15T19:56:44Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44728/files",
"html_url": "https://github.com/huggingface/transformers/pull/44728",
"labels": [],
"merged": false,
"number": 44728,
"review_comments_count": 0,
"state": "closed",
"title": "Fix float16 memory leak during 4-bit quantized model loading",
"updated_at": "2026-03-16T20:53:54Z"
},
{
"additions": 202,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Fixed issue where kwargs like force_download, proxies, token were not being passed to cached_file function.",
"changed_files": 11,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44727",
"created_at": "2026-03-15T19:41:24Z",
"deletions": 33,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44727/files",
"html_url": "https://github.com/huggingface/transformers/pull/44727",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44727,
"review_comments_count": 0,
"state": "closed",
"title": "fix: AutoProcessor.from_pretrained not passing kwargs to cached_file",
"updated_at": "2026-03-18T13:15:46Z"
},
{
"additions": 198,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Replaced bare except clause with except Exception in _safe_convert_tensor function to follow Python best practices (PEP 8).",
"changed_files": 10,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44725",
"created_at": "2026-03-15T17:41:18Z",
"deletions": 29,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44725/files",
"html_url": "https://github.com/huggingface/transformers/pull/44725",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44725,
"review_comments_count": 0,
"state": "closed",
"title": "fix: replace bare except with Exception in Fuyu image processing",
"updated_at": "2026-03-18T13:16:22Z"
},
{
"additions": 6,
"author": "ydshieh",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? TO be explained.",
"changed_files": 5,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44724",
"created_at": "2026-03-15T17:14:12Z",
"deletions": 5,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44724/files",
"html_url": "https://github.com/huggingface/transformers/pull/44724",
"labels": [],
"merged": false,
"number": 44724,
"review_comments_count": 1,
"state": "open",
"title": "Fix some missing / incorrect entries in auto files",
"updated_at": "2026-03-16T09:59:56Z"
},
{
"additions": 12,
"author": "aashirpersonal",
"author_association": "NONE",
"body_excerpt": "## Summary This PR fixes #44716 by exposing and forwarding `interpolate_pos_encoding` through the Pixio embedding/model call chain so the option is actually usable from `PixioModel.forward()`. ### Changes - Added `interpolate_pos_encoding:\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44723",
"created_at": "2026-03-15T16:52:03Z",
"deletions": 6,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44723/files",
"html_url": "https://github.com/huggingface/transformers/pull/44723",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44723,
"review_comments_count": 0,
"state": "closed",
"title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel",
"updated_at": "2026-03-18T15:05:52Z"
},
{
"additions": 38,
"author": "chandan11248",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## What does this PR do? Migrates the GPT-J model to use the new `@capture_outputs` and `@can_return_tuple` decorators for standardized output collection, as described in #43979. ### Changes - Added `_can_record_outputs` to `GPTJPreTrained\u2026",
"changed_files": 2,
"cluster_id": "cluster-43979-11",
"cluster_ids": [
"cluster-43979-11"
],
"cluster_role": "member",
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44722",
"created_at": "2026-03-15T15:33:25Z",
"deletions": 110,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44722/files",
"html_url": "https://github.com/huggingface/transformers/pull/44722",
"labels": [],
"merged": false,
"number": 44722,
"review_comments_count": 0,
"state": "open",
"title": "Refactor gptj output tracing to use standardized decorators",
"updated_at": "2026-03-19T18:12:59Z"
},
{
"additions": 4,
"author": "rsmed31",
"author_association": "NONE",
"body_excerpt": "## Summary Fixes #44716 `PixioPatchEmbeddings.forward` already accepted `interpolate_pos_encoding` but it was silently dropped \u2014 never passed from `PixioEmbeddings.forward` or `PixioModel.forward`, making the parameter effectively unusable\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44718",
"created_at": "2026-03-14T23:57:14Z",
"deletions": 3,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44718/files",
"html_url": "https://github.com/huggingface/transformers/pull/44718",
"labels": [],
"merged": false,
"number": 44718,
"review_comments_count": 0,
"state": "closed",
"title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel",
"updated_at": "2026-03-15T17:58:58Z"
},
{
"additions": 15,
"author": "ydshieh",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? As discussed internally, some component model classes didn't specify the correct config classes. This PR fixes them (those I could found - because the tiny model creation script fails due to those mistakes).",
"changed_files": 7,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44715",
"created_at": "2026-03-14T21:11:52Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44715/files",
"html_url": "https://github.com/huggingface/transformers/pull/44715",
"labels": [],
"merged": true,
"number": 44715,
"review_comments_count": 0,
"state": "closed",
"title": "Fix missing / incorrect `config` class in some model class definitions",
"updated_at": "2026-03-15T11:19:51Z"
},
{
"additions": 181,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating from core config to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but `text_config` still has default\u2026",
"changed_files": 8,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44714",
"created_at": "2026-03-14T20:42:46Z",
"deletions": 26,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44714/files",
"html_url": "https://github.com/huggingface/transformers/pull/44714",
"labels": [],
"merged": false,
"number": 44714,
"review_comments_count": 0,
"state": "closed",
"title": "fix: propagate num_labels to text_config for Qwen models",
"updated_at": "2026-03-18T12:56:27Z"
},
{
"additions": 15,
"author": "kulkarni-rohan",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Applies the output tracing refactor to ColQwen2ForRetrieval as part of the broader effort tracked in issue #43979 to modernize output handling across all models in the library. Changes in both modular_colqwen2.py and modeling_colqwen2.py:\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44713",
"created_at": "2026-03-14T20:20:14Z",
"deletions": 28,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44713/files",
"html_url": "https://github.com/huggingface/transformers/pull/44713",
"labels": [],
"merged": false,
"number": 44713,
"review_comments_count": 0,
"state": "open",
"title": "[ColQwen2] Refactor output tracing (issue #43979)",
"updated_at": "2026-03-14T20:21:24Z"
},
{
"additions": 2,
"author": "ydshieh",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? torch 2.11 is going to be released soon, but we still use 2.9. Let's update it to 2.10 so at least a run with torch 2.10, before we update to torch 2.11 later.",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44712",
"created_at": "2026-03-14T20:18:01Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44712/files",
"html_url": "https://github.com/huggingface/transformers/pull/44712",
"labels": [],
"merged": true,
"number": 44712,
"review_comments_count": 0,
"state": "closed",
"title": "Update Nvidia CI docker file to use torch 2.10",
"updated_at": "2026-03-14T20:29:04Z"
},
{
"additions": 339,
"author": "anuq",
"author_association": "NONE",
"body_excerpt": "## What does this PR do? Fixes #35141. When `tie_word_embeddings=False`, calling `resize_token_embeddings()` creates a new `nn.Linear` for the LM head via `_get_resized_lm_head()`. The new module's weight and bias tensors do **not** carry\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44711",
"created_at": "2026-03-14T19:21:21Z",
"deletions": 205,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44711/files",
"html_url": "https://github.com/huggingface/transformers/pull/44711",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44711,
"review_comments_count": 0,
"state": "closed",
"title": "fix: mark new lm_head params as `_is_hf_initialized` after `resize_token_embeddings`",
"updated_at": "2026-03-20T13:36:58Z"
},
{
"additions": 12,
"author": "he-yufeng",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## What does this PR do? Fixes `AutoProcessor.from_pretrained` silently dropping hub kwargs like `force_download`, `cache_dir`, `token`, `revision`, etc. ### The bug The existing code on line ~300 filters kwargs using `inspect.signature(ca\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44710",
"created_at": "2026-03-14T18:33:53Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44710/files",
"html_url": "https://github.com/huggingface/transformers/pull/44710",
"labels": [],
"merged": true,
"number": 44710,
"review_comments_count": 0,
"state": "closed",
"title": "Fix AutoProcessor.from_pretrained silently dropping hub kwargs",
"updated_at": "2026-03-25T18:13:14Z"
},
{
"additions": 6778,
"author": "LucasMa2025",
"author_association": "FIRST_TIMER",
"body_excerpt": "# \ud83c\udf9b\ufe0f Add Configurable Generation Scheduler and State Machine for `generate()` ## Summary This PR introduces a **fully optional, zero-intrusion** Generation Scheduler (`GenerationScheduler`) and explicit state machine (`GenerationStateMachi\u2026",
"changed_files": 15,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44708",
"created_at": "2026-03-14T17:13:34Z",
"deletions": 7,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44708/files",
"html_url": "https://github.com/huggingface/transformers/pull/44708",
"labels": [],
"merged": false,
"number": 44708,
"review_comments_count": 0,
"state": "closed",
"title": "Add Configurable Generation Scheduler and State Machine for `generate()`",
"updated_at": "2026-03-14T19:19:11Z"
},
{
"additions": 3,
"author": "saivedant169",
"author_association": "NONE",
"body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `MptForCausalLM.forward()` and `MptModel.forward()`, bringing MPT in line with other CausalLM models. Same rationale as the Bloom PR (#44706) \u2014 M\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44707",
"created_at": "2026-03-14T17:12:16Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44707/files",
"html_url": "https://github.com/huggingface/transformers/pull/44707",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44707,
"review_comments_count": 0,
"state": "closed",
"title": "Add position_ids to MptForCausalLM forward pass",
"updated_at": "2026-03-18T13:39:36Z"
},
{
"additions": 3,
"author": "saivedant169",
"author_association": "NONE",
"body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `BloomForCausalLM.forward()` and `BloomModel.forward()`, bringing Bloom in line with other CausalLM models like Llama, Falcon, Gemma, and Mistral\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44706",
"created_at": "2026-03-14T17:09:11Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44706/files",
"html_url": "https://github.com/huggingface/transformers/pull/44706",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44706,
"review_comments_count": 0,
"state": "closed",
"title": "Add position_ids to BloomForCausalLM forward pass",
"updated_at": "2026-03-18T13:39:51Z"
},
{
"additions": 14,
"author": "saivedant169",
"author_association": "NONE",
"body_excerpt": "Fixes part of #32937 ## What does this PR do? RoFormer introduced rotary position embeddings, but its `ForCausalLM` forward method doesn't accept `position_ids` \u2014 which means callers can't specify custom positions for packed sequences or f\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44705",
"created_at": "2026-03-14T16:48:06Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44705/files",
"html_url": "https://github.com/huggingface/transformers/pull/44705",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44705,
"review_comments_count": 0,
"state": "closed",
"title": "Add position_ids to RoFormerForCausalLM forward pass",
"updated_at": "2026-03-18T13:40:05Z"
},
{
"additions": 26,
"author": "vasqu",
"author_association": "MEMBER",
"body_excerpt": "As per title, it seems that the `cute` subfolder can be even distributed if you only install FA2 which implies something wrong. Now we check under the (normalized) distribution names",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44703",
"created_at": "2026-03-14T14:46:02Z",
"deletions": 10,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44703/files",
"html_url": "https://github.com/huggingface/transformers/pull/44703",
"labels": [],
"merged": true,
"number": 44703,
"review_comments_count": 1,
"state": "closed",
"title": "[`FA`] Fix fa detection",
"updated_at": "2026-03-14T17:19:07Z"
},
{
"additions": 148,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## What does this PR fix? The `rms_norm_eps` parameter in `MistralConfig` was incorrectly typed as `int | None` but defaults to `1e-6` which is a float. This parameter is passed to `MistralRMSNorm` which expects `eps: float`. ### Bug Detai\u2026",
"changed_files": 8,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44702",
"created_at": "2026-03-14T14:41:15Z",
"deletions": 25,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44702/files",
"html_url": "https://github.com/huggingface/transformers/pull/44702",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44702,
"review_comments_count": 0,
"state": "closed",
"title": "fix: Correct rms_norm_eps type hint from int to float in MistralConfig",
"updated_at": "2026-03-18T13:00:12Z"
},
{
"additions": 219,
"author": "hmellor",
"author_association": "MEMBER",
"body_excerpt": "These models have `base_model_pp_plan`s but currently do not work because the base model's forward pass depends on all the `layers` being `Qwen2VLDecoderLayer`. i.e. if one of the layers is removed/replaced with `Identity`, `decoder_layer.\u2026",
"changed_files": 52,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44699",
"created_at": "2026-03-14T11:44:24Z",
"deletions": 148,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44699/files",
"html_url": "https://github.com/huggingface/transformers/pull/44699",
"labels": [],
"merged": true,
"number": 44699,
"review_comments_count": 0,
"state": "closed",
"title": "Fix several based models' pipeline parallel support",
"updated_at": "2026-03-20T13:53:27Z"
},
{
"additions": 1,
"author": "hmellor",
"author_association": "MEMBER",
"body_excerpt": "The typo in the `elif` chain meant that `image` and `video` modalidty encoders could not be set using this method. This PR fixes the typo so that they can.",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44698",
"created_at": "2026-03-14T11:18:54Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44698/files",
"html_url": "https://github.com/huggingface/transformers/pull/44698",
"labels": [],
"merged": true,
"number": 44698,
"review_comments_count": 0,
"state": "closed",
"title": "Fix `set_encoder`",
"updated_at": "2026-03-14T13:42:00Z"
},
{
"additions": 75,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## Description The `torch_float` function in `src/transformers/utils/generic.py` was incorrectly returning `int(x)` in two places where it should return `float(x)`: 1. When torch is not available (fallback case) 2. When not in a tracing co\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44697",
"created_at": "2026-03-14T10:44:12Z",
"deletions": 25,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44697/files",
"html_url": "https://github.com/huggingface/transformers/pull/44697",
"labels": [],
"merged": false,
"number": 44697,
"review_comments_count": 1,
"state": "open",
"title": "fix: torch_float should return float, not int",
"updated_at": "2026-03-17T19:29:02Z"
},
{
"additions": 19,
"author": "hmellor",
"author_association": "MEMBER",
"body_excerpt": "In configs, `base_model_pp_plan` and `base_model_tp_plan` default to `None` In models, `_pp_plan` and `_tp_plan` _look like_ they default to `None` based on the class variables, but will actually always be a dict because of `post_init`. Th\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44696",
"created_at": "2026-03-14T09:41:07Z",
"deletions": 13,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44696/files",
"html_url": "https://github.com/huggingface/transformers/pull/44696",
"labels": [],
"merged": true,
"number": 44696,
"review_comments_count": 5,
"state": "closed",
"title": "Fix `supports_{tp/pp}_plan`",
"updated_at": "2026-03-18T12:33:58Z"
},
{
"additions": 4,
"author": "harshaljanjani",
"author_association": "CONTRIBUTOR",
"body_excerpt": "### What does this PR do? The following failing tests were identified and fixed in this PR: \u2192 **Kyutai Speech-To-Text**: [The PR [processors] Unbloating simple processors](https://github.com/huggingface/transformers/pull/40377), [refactore\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44695",
"created_at": "2026-03-14T09:05:35Z",
"deletions": 9,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44695/files",
"html_url": "https://github.com/huggingface/transformers/pull/44695",
"labels": [],
"merged": false,
"number": 44695,
"review_comments_count": 3,
"state": "open",
"title": "fix(testing): Fix Kyutai Speech-To-Text, LLaVA-OneVision, and LongCatFlash test failures on main CI ",
"updated_at": "2026-03-23T11:51:26Z"
},
{
"additions": 143,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagated from core config to text config. When loading `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the outer config gets `num_labels=1` but the inner `text_config` still ha\u2026",
"changed_files": 7,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44693",
"created_at": "2026-03-14T05:43:00Z",
"deletions": 30,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44693/files",
"html_url": "https://github.com/huggingface/transformers/pull/44693",
"labels": [],
"merged": false,
"number": 44693,
"review_comments_count": 0,
"state": "closed",
"title": "fix: Propagate num_labels to text_config in Qwen3.5",
"updated_at": "2026-03-18T12:56:25Z"
},
{
"additions": 18,
"author": "gambletan",
"author_association": "NONE",
"body_excerpt": "## Summary Fixes #44514. `Qwen2_5_VLProcessor.apply_chat_template` crashes with `ValueError` when called with batched input and `padding=False` (the default). The root cause is `np.array(text_inputs[\"input_ids\"])` which fails when sequence\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44692",
"created_at": "2026-03-14T04:14:38Z",
"deletions": 10,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44692/files",
"html_url": "https://github.com/huggingface/transformers/pull/44692",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44692,
"review_comments_count": 0,
"state": "closed",
"title": "fix: handle ragged input_ids in Qwen2_5_VLProcessor.apply_chat_template",
"updated_at": "2026-03-18T12:44:18Z"
},
{
"additions": 23,
"author": "gambletan",
"author_association": "NONE",
"body_excerpt": "## Summary - Fixes `num_labels` (and `id2label`/`label2id`) not being propagated from the outer `Qwen3_5Config` to its inner `text_config` when passed via `AutoConfig.from_pretrained(..., num_labels=1)`. - When `text_config` is `None` or a\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44691",
"created_at": "2026-03-14T04:10:54Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44691/files",
"html_url": "https://github.com/huggingface/transformers/pull/44691",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44691,
"review_comments_count": 0,
"state": "closed",
"title": "Fix Qwen3.5 num_labels not propagated to text_config",
"updated_at": "2026-03-18T12:57:19Z"
},
{
"additions": 6,
"author": "gambletan",
"author_association": "NONE",
"body_excerpt": "## Summary Fixes #44360 The `GlmMoeDsaIndexer` is missing a ReLU activation on the per-head dot-product scores before the weighted sum across heads. The reference DeepSeek V3.2 implementation applies ReLU inside the `fp8_index` kernel: ```\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44690",
"created_at": "2026-03-14T03:44:37Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44690/files",
"html_url": "https://github.com/huggingface/transformers/pull/44690",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44690,
"review_comments_count": 0,
"state": "closed",
"title": "Fix missing ReLU in GLM-MOE-DSA indexer scoring",
"updated_at": "2026-03-18T12:40:23Z"
},
{
"additions": 141,
"author": "LincolnBurrows2017",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but text_config still has default `num_labels=2`. Thi\u2026",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44688",
"created_at": "2026-03-14T00:40:50Z",
"deletions": 23,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44688/files",
"html_url": "https://github.com/huggingface/transformers/pull/44688",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44688,
"review_comments_count": 0,
"state": "closed",
"title": "fix: Propagate num_labels to text_config in Qwen models",
"updated_at": "2026-03-18T12:56:41Z"
},
{
"additions": 8,
"author": "vxa8502",
"author_association": "NONE",
"body_excerpt": "Fixes partial #32937 Adds explicit `position_ids` threading through GPT-Neo's attention layers to enable flash attention's packed sequence optimization. ## Context GPT-Neo uses learned absolute position embeddings (`wpe`) applied at the mo\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44687",
"created_at": "2026-03-13T23:28:55Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44687/files",
"html_url": "https://github.com/huggingface/transformers/pull/44687",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44687,
"review_comments_count": 0,
"state": "closed",
"title": "Add explicit position_ids to GPT-Neo attention layers",
"updated_at": "2026-03-18T13:06:49Z"
},
{
"additions": 615,
"author": "tejasae-afk",
"author_association": "NONE",
"body_excerpt": "During an automated code review of src/transformers/models/marian/convert_marian_to_pytorch.py, the following issue was identified. Use safe_load in convert marian to pytorch. yaml.load on untrusted input can construct arbitrary Python obj\u2026",
"changed_files": 80,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44686",
"created_at": "2026-03-13T21:22:07Z",
"deletions": 259,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44686/files",
"html_url": "https://github.com/huggingface/transformers/pull/44686",
"labels": [],
"merged": false,
"number": 44686,
"review_comments_count": 0,
"state": "closed",
"title": "Use safe_load in convert marian to pytorch",
"updated_at": "2026-03-14T03:54:31Z"
},
{
"additions": 10,
"author": "ydshieh",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? For tiny model creation script - new added model test files still miss this argument ...",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44685",
"created_at": "2026-03-13T20:53:41Z",
"deletions": 3,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44685/files",
"html_url": "https://github.com/huggingface/transformers/pull/44685",
"labels": [],
"merged": true,
"number": 44685,
"review_comments_count": 0,
"state": "closed",
"title": "Fix more model tester missing `parent` issue",
"updated_at": "2026-03-13T21:03:46Z"
},
{
"additions": 41,
"author": "ntenenz",
"author_association": "CONTRIBUTOR",
"body_excerpt": "\u2026 # What does this PR do? In torch versions >= 2.9.0, it requests the lse from flex_attenetion using `AuxRequest` instead of the deprecated `return_lse`, which triggers a warning and can break tracing. Fixes #44683 ## Before submitting - [\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44684",
"created_at": "2026-03-13T20:16:35Z",
"deletions": 5,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44684/files",
"html_url": "https://github.com/huggingface/transformers/pull/44684",
"labels": [],
"merged": true,
"number": 44684,
"review_comments_count": 8,
"state": "closed",
"title": "update flex attention to use `return_aux` instead of `return_lse` when torch verison >= 2.9",
"updated_at": "2026-03-18T11:44:18Z"
},
{
"additions": 301,
"author": "SunMarc",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Llama cpp integration in transformers serve. Minor changes to add llama.cpp integration Mostly changes on serve to fix latency for streaming and non streaming",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44682",
"created_at": "2026-03-13T18:52:41Z",
"deletions": 73,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44682/files",
"html_url": "https://github.com/huggingface/transformers/pull/44682",
"labels": [],
"merged": false,
"number": 44682,
"review_comments_count": 0,
"state": "open",
"title": "transformers serve + llamacpp",
"updated_at": "2026-03-14T07:05:29Z"
},
{
"additions": 47,
"author": "dacorvo",
"author_association": "MEMBER",
"body_excerpt": "Fixes #44679 ## Summary - Custom attention kernels registered via `load_and_register_attn_kernel` currently get hardcoded `flash_attention_2` mask dispatch, which produces 2D or `None` masks - Kernels that need SDPA-style 4D boolean masks\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44680",
"created_at": "2026-03-13T17:55:54Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44680/files",
"html_url": "https://github.com/huggingface/transformers/pull/44680",
"labels": [],
"merged": false,
"number": 44680,
"review_comments_count": 12,
"state": "open",
"title": "Allow kernel modules to declare their preferred mask function",
"updated_at": "2026-03-19T11:27:09Z"
},
{
"additions": 9,
"author": "JokeYoonic",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Problem: - On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head forward pass produces NaN/Inf values during inference - Root cause: lm_head.weight is tied to transformer.wte.weight, and the shared memory reference causes\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44676",
"created_at": "2026-03-13T16:28:01Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44676/files",
"html_url": "https://github.com/huggingface/transformers/pull/44676",
"labels": [],
"merged": false,
"number": 44676,
"review_comments_count": 0,
"state": "open",
"title": "fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights",
"updated_at": "2026-03-18T17:16:49Z"
},
{
"additions": 32,
"author": "stevhliu",
"author_association": "MEMBER",
"body_excerpt": "properly formats the `ContinuousBatchingConfig` below:
",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44675",
"created_at": "2026-03-13T16:10:28Z",
"deletions": 14,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44675/files",
"html_url": "https://github.com/huggingface/transformers/pull/44675",
"labels": [],
"merged": true,
"number": 44675,
"review_comments_count": 0,
"state": "closed",
"title": "[docs] cb config",
"updated_at": "2026-03-13T23:15:04Z"
},
{
"additions": 408,
"author": "Rocketknight1",
"author_association": "MEMBER",
"body_excerpt": "We've had `parse_response()` in the library for a while, but it's been a soft launch / prototype feature. This PR cleans it up and documents it, making it an official feature! The API is largely unchanged from the prototype, but we drop `x\u2026",
"changed_files": 5,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44674",
"created_at": "2026-03-13T15:41:42Z",
"deletions": 34,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44674/files",
"html_url": "https://github.com/huggingface/transformers/pull/44674",
"labels": [],
"merged": true,
"number": 44674,
"review_comments_count": 11,
"state": "closed",
"title": "Officially launch parse_response",
"updated_at": "2026-03-24T15:55:05Z"
},
{
"additions": 73,
"author": "remi-or",
"author_association": "MEMBER",
"body_excerpt": "This PR fixes a bug in continuous batching where non-CUDA devices cannot use the feature because some CUDA-exclusive objects are always instantiated. It also adds a test to make sure this will not break again in the future.",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44673",
"created_at": "2026-03-13T15:37:01Z",
"deletions": 15,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44673/files",
"html_url": "https://github.com/huggingface/transformers/pull/44673",
"labels": [],
"merged": true,
"number": 44673,
"review_comments_count": 0,
"state": "closed",
"title": "[CB] [Bug] Fix crashes when running without cuda",
"updated_at": "2026-03-15T23:59:55Z"
},
{
"additions": 1,
"author": "neo",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? modular doesn't properly convert some files (e.g. kyutai) Also fixes red CI on main",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44283",
"created_at": "2026-02-25T18:33:17Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44283/files",
"html_url": "https://github.com/huggingface/transformers/pull/44283",
"labels": [],
"merged": true,
"number": 44283,
"review_comments_count": 0,
"state": "closed",
"title": "[`Modular`] Fix file type regression",
"updated_at": "2026-02-25T20:04:41Z"
},
{
"additions": 5,
"author": "Rocketknight1",
"author_association": "MEMBER",
"body_excerpt": "Response schema save-loading was broken in #40936, this PR restores it! I did most of this in #42300 but missed an issue with loading/saving.",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44282",
"created_at": "2026-02-25T17:57:54Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44282/files",
"html_url": "https://github.com/huggingface/transformers/pull/44282",
"labels": [],
"merged": true,
"number": 44282,
"review_comments_count": 0,
"state": "closed",
"title": "Restore response_schema saving-loading",
"updated_at": "2026-02-25T18:27:22Z"
},
{
"additions": 1,
"author": "ArthurZucker",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Its a very small fix for #44062",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44281",
"created_at": "2026-02-25T16:28:37Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44281/files",
"html_url": "https://github.com/huggingface/transformers/pull/44281",
"labels": [],
"merged": true,
"number": 44281,
"review_comments_count": 0,
"state": "closed",
"title": "Fix special token maps BC",
"updated_at": "2026-02-26T10:34:17Z"
},
{
"additions": 614,
"author": "RishabhMehra",
"author_association": "FIRST_TIMER",
"body_excerpt": "# What does this PR do? - Adds an opt-in use_fast_grouping flag to TokenClassificationPipeline to enable a NumPy-vectorised BIO grouping path (~5\u00d7 faster on long sequences) while keeping the legacy path as default. - Improves correctness:\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44278",
"created_at": "2026-02-25T12:49:56Z",
"deletions": 63,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44278/files",
"html_url": "https://github.com/huggingface/transformers/pull/44278",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44278,
"review_comments_count": 0,
"state": "closed",
"title": "[FEAT] Pipelines - Faster group_entities",
"updated_at": "2026-02-25T13:54:58Z"
},
{
"additions": 171,
"author": "tarekziade",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? The GLM-ASR integration test in the documentation is a copy of the one in the test suite. This patch removes duplication by: - moving the tests in the docs using `runnables` - see https://github.com/huggingface/doc-\u2026",
"changed_files": 10,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 28,
"conversation_url": "https://github.com/huggingface/transformers/pull/44277",
"created_at": "2026-02-25T08:49:20Z",
"deletions": 77,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44277/files",
"html_url": "https://github.com/huggingface/transformers/pull/44277",
"labels": [],
"merged": false,
"number": 44277,
"review_comments_count": 5,
"state": "open",
"title": "Use doc-builder runnable example for GLM-ASR",
"updated_at": "2026-03-19T09:01:16Z"
},
{
"additions": 0,
"author": "vishalpatil-45",
"author_association": "NONE",
"body_excerpt": "# What does this PR do? This PR addresses the performance regression where `import transformers` takes ~3.5s. The issue was caused by eager imports of heavy backend libraries (like torch/numpy) during the initial module load. By moving the\u2026",
"changed_files": 0,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44275",
"created_at": "2026-02-25T08:27:32Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44275/files",
"html_url": "https://github.com/huggingface/transformers/pull/44275",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44275,
"review_comments_count": 0,
"state": "closed",
"title": "[Fix] Restore lazy loading to improve import performance (#44273)",
"updated_at": "2026-02-25T20:37:18Z"
},
{
"additions": 559,
"author": "paipeline",
"author_association": "NONE",
"body_excerpt": "## Description Fixes #44242 This PR resolves an issue where the auxiliary load balancing loss was not computed when `output_router_logits=False`, even when `router_aux_loss_coef != 0`. ## Problem The auxiliary loss computation was incorrec\u2026",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44274",
"created_at": "2026-02-25T06:38:02Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44274/files",
"html_url": "https://github.com/huggingface/transformers/pull/44274",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44274,
"review_comments_count": 0,
"state": "closed",
"title": "Fix auxiliary load balancing loss computation when output_router_logits=False",
"updated_at": "2026-02-25T13:36:03Z"
},
{
"additions": 1,
"author": "hangjun-ezra",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## What does this PR do? Fixes a `TypeError: unsupported operand type(s) for |: 'list' and 'set'` in `RotaryEmbeddingConfigMixin.convert_rope_params_to_dict` when `ignore_keys_at_rope_validation` is a `list` instead of a `set`. ### Root ca\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44272",
"created_at": "2026-02-25T03:52:04Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44272/files",
"html_url": "https://github.com/huggingface/transformers/pull/44272",
"labels": [],
"merged": true,
"number": 44272,
"review_comments_count": 0,
"state": "closed",
"title": "Fix TypeError in convert_rope_params_to_dict when ignore_keys is a list",
"updated_at": "2026-02-25T14:38:36Z"
},
{
"additions": 1272,
"author": "balak4",
"author_association": "CONTRIBUTOR",
"body_excerpt": "## Summary - Add GreedyLR, a metric-based adaptive learning rate scheduler that adjusts the learning rate during training based on the current loss - Based on [\"Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Converg\u2026",
"changed_files": 10,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 9,
"conversation_url": "https://github.com/huggingface/transformers/pull/44271",
"created_at": "2026-02-25T01:40:57Z",
"deletions": 7,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44271/files",
"html_url": "https://github.com/huggingface/transformers/pull/44271",
"labels": [],
"merged": true,
"number": 44271,
"review_comments_count": 3,
"state": "closed",
"title": "Add GreedyLR adaptive learning rate scheduler",
"updated_at": "2026-03-18T18:45:46Z"
},
{
"additions": 88,
"author": "yonigozlan",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? A lot of ProcessorsKwargs have incorrect/unspecified type hints in their ProcessorsKwargs TypedDict for their images_kwargs attribute. Functionnaly, this did not cause issues as \"_merge_kwargs\" automatically picks u\u2026",
"changed_files": 44,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44270",
"created_at": "2026-02-25T00:11:31Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44270/files",
"html_url": "https://github.com/huggingface/transformers/pull/44270",
"labels": [],
"merged": false,
"number": 44270,
"review_comments_count": 0,
"state": "open",
"title": "Add correct typing to custom images_kwargs in ProcessorsKwargs",
"updated_at": "2026-02-25T01:12:06Z"
},
{
"additions": 30,
"author": "yonigozlan",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? This is a follow-up to https://github.com/huggingface/transformers/pull/43748, and will allow to have clickable links to the full modality kwargs when present in the docstring of a processor or image processor Cc @s\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44269",
"created_at": "2026-02-25T00:05:47Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44269/files",
"html_url": "https://github.com/huggingface/transformers/pull/44269",
"labels": [],
"merged": true,
"number": 44269,
"review_comments_count": 0,
"state": "closed",
"title": "Add `ProcessingKwargs` `ImagesKwargs` etc. to docs",
"updated_at": "2026-02-27T19:03:15Z"
},
{
"additions": 5,
"author": "ethanknights",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? Some improvements to the `trainer.py` docs. ## Before submitting - [x] This PR fixes a typo or improves the docs. ## Who can review? Documentation: @stevhliu",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44268",
"created_at": "2026-02-24T23:20:16Z",
"deletions": 4,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44268/files",
"html_url": "https://github.com/huggingface/transformers/pull/44268",
"labels": [],
"merged": true,
"number": 44268,
"review_comments_count": 0,
"state": "closed",
"title": "chore: fixes in `Trainer` class docs (`compute_loss` & `hyperparameter_search`)",
"updated_at": "2026-02-26T00:50:23Z"
},
{
"additions": 4,
"author": "manavshrivastavagit",
"author_association": "NONE",
"body_excerpt": "## Summary - Update the `DocumentQuestionAnsweringPipeline` docstring to explicitly mention the task summary in the Transformers documentation. - Remove the stale TODO comment now that document question answering is covered in the task sum\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44267",
"created_at": "2026-02-24T20:35:18Z",
"deletions": 4,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44267/files",
"html_url": "https://github.com/huggingface/transformers/pull/44267",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44267,
"review_comments_count": 0,
"state": "closed",
"title": "Docs: point DocumentQuestionAnswering pipeline to task summary",
"updated_at": "2026-02-25T13:34:48Z"
},
{
"additions": 27,
"author": "harshaljanjani",
"author_association": "CONTRIBUTOR",
"body_excerpt": "### What does this PR do? The following issue was identified and fixed in this PR: \u2192 **Reasoning:** The impact of this fix goes beyond `Mask2Former` and `DeformableDetr` and should fix any model that uses `torch_compilable_check`. Most use\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 8,
"conversation_url": "https://github.com/huggingface/transformers/pull/44266",
"created_at": "2026-02-24T20:02:06Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44266/files",
"html_url": "https://github.com/huggingface/transformers/pull/44266",
"labels": [],
"merged": true,
"number": 44266,
"review_comments_count": 0,
"state": "closed",
"title": "fix(utils): Make torch_compilable_check compatible with torch.export strict mode",
"updated_at": "2026-02-26T09:42:47Z"
},
{
"additions": 90,
"author": "vasqu",
"author_association": "MEMBER",
"body_excerpt": "As per title, WIP --> needs a test",
"changed_files": 36,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44264",
"created_at": "2026-02-24T18:06:58Z",
"deletions": 210,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/44264/files",
"html_url": "https://github.com/huggingface/transformers/pull/44264",
"labels": [],
"merged": false,
"number": 44264,
"review_comments_count": 3,
"state": "open",
"title": "[`Moe`] Enable aux loss automatically when in training + coef is not 0",
"updated_at": "2026-02-25T18:53:20Z"
},
{
"additions": 5882,
"author": "SunMarc",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? This PR refactor the common tests that we have in Trainer. I've mainly did the following: - Split the tests that we have in `test_trainer.py` into multiple files. - Fix common tests that were failing in the CI",
"changed_files": 18,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44260",
"created_at": "2026-02-24T15:51:11Z",
"deletions": 6147,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44260/files",
"html_url": "https://github.com/huggingface/transformers/pull/44260",
"labels": [],
"merged": true,
"number": 44260,
"review_comments_count": 3,
"state": "closed",
"title": "Update common tests Trainer",
"updated_at": "2026-02-27T17:31:59Z"
},
{
"additions": 1830,
"author": "winglian",
"author_association": "COLLABORATOR",
"body_excerpt": "# What does this PR do? This PR supersedes #43985 to replace the dataset/sampler/dataloader with a data producer that should allow us to more easily get to the next step of async training for RL. \"\". Then we compare `\"\" != \"LlamaTokenizer\"` (the `tokenizer_class` in `tokenizer_config.json`). Since that's true we earl\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 5,
"conversation_url": "https://github.com/huggingface/transformers/pull/44127",
"created_at": "2026-02-18T10:41:48Z",
"deletions": 8,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44127/files",
"html_url": "https://github.com/huggingface/transformers/pull/44127",
"labels": [],
"merged": true,
"number": 44127,
"review_comments_count": 0,
"state": "closed",
"title": "AutoTokenizer ignores config when model_type is None",
"updated_at": "2026-02-18T14:47:52Z"
},
{
"additions": 17,
"author": "Cyrilvallez",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? As per the title. Let's simplify after https://github.com/huggingface/transformers/pull/42848",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44126",
"created_at": "2026-02-18T09:58:49Z",
"deletions": 40,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44126/files",
"html_url": "https://github.com/huggingface/transformers/pull/44126",
"labels": [],
"merged": true,
"number": 44126,
"review_comments_count": 0,
"state": "closed",
"title": "Simplify input preparation in generate",
"updated_at": "2026-02-18T10:30:48Z"
},
{
"additions": 8,
"author": "zucchini-nlp",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/43986",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44125",
"created_at": "2026-02-18T09:34:54Z",
"deletions": 7,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44125/files",
"html_url": "https://github.com/huggingface/transformers/pull/44125",
"labels": [],
"merged": true,
"number": 44125,
"review_comments_count": 2,
"state": "closed",
"title": "Raise informative error when loading video processors",
"updated_at": "2026-02-20T08:23:35Z"
},
{
"additions": 10,
"author": "mariam851",
"author_association": "CONTRIBUTOR",
"body_excerpt": "Description: Adds eval_on_end to TrainingArguments to force evaluation at the end of training, even if the last step doesn't align with eval_steps. Changes: training_args.py: Added eval_on_end field. trainer.py: Added logic to call evaluat\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44124",
"created_at": "2026-02-18T08:52:23Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44124/files",
"html_url": "https://github.com/huggingface/transformers/pull/44124",
"labels": [],
"merged": false,
"number": 44124,
"review_comments_count": 0,
"state": "closed",
"title": "feat: add eval_on_end to Trainer for final evaluation",
"updated_at": "2026-02-18T14:14:16Z"
},
{
"additions": 15,
"author": "cyyever",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? This PR avoids device sync in training loss accumulation by ```torch.where```. The `is_torch_xla_available` condition is also removed.",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44123",
"created_at": "2026-02-18T08:22:57Z",
"deletions": 21,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44123/files",
"html_url": "https://github.com/huggingface/transformers/pull/44123",
"labels": [],
"merged": false,
"number": 44123,
"review_comments_count": 0,
"state": "open",
"title": "Avoid device sync in training loss accumulation",
"updated_at": "2026-02-20T04:43:19Z"
},
{
"additions": 158,
"author": "adityuhkapoor",
"author_association": "NONE",
"body_excerpt": "# What does this PR do? Adds 4-bit embedding quantization for BitsAndBytes, mirroring TorchAO's existing `include_input_output_embeddings` and `untie_embedding_weights` pattern (PRs #37802, #37905, #37935). Large-vocabulary models (Llama 3\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44122",
"created_at": "2026-02-18T06:35:09Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44122/files",
"html_url": "https://github.com/huggingface/transformers/pull/44122",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44122,
"review_comments_count": 0,
"state": "closed",
"title": "Add BnB 4-bit embedding quantization support",
"updated_at": "2026-02-18T14:27:25Z"
},
{
"additions": 14,
"author": "tirth8205",
"author_association": "NONE",
"body_excerpt": "Fixes #34920 After applying `normalize()`, images can have negative values. Calling `resize()` on such images fails because it internally converts to PIL, which requires values in [0, 1] or [0, 255]. ### Fix When the image has values outsi\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44120",
"created_at": "2026-02-17T23:56:48Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44120/files",
"html_url": "https://github.com/huggingface/transformers/pull/44120",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44120,
"review_comments_count": 0,
"state": "closed",
"title": "fix: allow image_transforms.resize to handle negative values after normalization",
"updated_at": "2026-02-18T14:08:54Z"
},
{
"additions": 1,
"author": "tirth8205",
"author_association": "NONE",
"body_excerpt": "Fixes #44117 `TOKENIZER_MAPPING_NAMES.get(config_model_type, \"\")` returns `None` when the key exists with value `None`, causing `AttributeError: 'NoneType' object has no attribute 'replace'` when loading models like `google/siglip2-so400m-\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/44119",
"created_at": "2026-02-17T23:53:20Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44119/files",
"html_url": "https://github.com/huggingface/transformers/pull/44119",
"labels": [],
"merged": false,
"number": 44119,
"review_comments_count": 0,
"state": "closed",
"title": "fix: handle None value from TOKENIZER_MAPPING_NAMES.get() in AutoTokenizer",
"updated_at": "2026-02-18T14:04:47Z"
},
{
"additions": 32,
"author": "tirth8205",
"author_association": "NONE",
"body_excerpt": "## Fix Fixes #44079 When a `ModelOutput` dataclass field is initialized as `None`, it is correctly excluded from the OrderedDict keys. However, **subsequently setting that field to a non-None value** via attribute assignment (e.g. `outputs\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 0,
"conversation_url": "https://github.com/huggingface/transformers/pull/44118",
"created_at": "2026-02-17T23:31:31Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44118/files",
"html_url": "https://github.com/huggingface/transformers/pull/44118",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 44118,
"review_comments_count": 0,
"state": "closed",
"title": "fix: ModelOutput keys not updated when setting previously-None dataclass fields",
"updated_at": "2026-02-18T14:18:12Z"
},
{
"additions": 27,
"author": "dtiourine",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "Migrate Flaubert to the @capture_outputs and @can_return_tuple decorator pattern for output handling, as part of #43979. # What does this PR do? - Add `_can_record_outputs = {\"attentions\": MultiHeadAttention}` on `FlaubertPreTrainedModel`\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44116",
"created_at": "2026-02-17T21:52:13Z",
"deletions": 102,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44116/files",
"html_url": "https://github.com/huggingface/transformers/pull/44116",
"labels": [],
"merged": false,
"number": 44116,
"review_comments_count": 0,
"state": "open",
"title": "[WIP] [Flaubert] Refactor output tracing to decorator-based interface",
"updated_at": "2026-02-17T21:53:23Z"
},
{
"additions": 2,
"author": "Deep-unlearning",
"author_association": "MEMBER",
"body_excerpt": "## Summary - Fix broken `[chat template](./chat_templating)` links in `docs/source/en/tasks/` - `./chat_templating` resolves within `tasks/` (doesn't exist); corrected to `../chat_templating` - Affected files: `tasks/image_text_to_text.md`\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44115",
"created_at": "2026-02-17T21:32:55Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44115/files",
"html_url": "https://github.com/huggingface/transformers/pull/44115",
"labels": [],
"merged": true,
"number": 44115,
"review_comments_count": 0,
"state": "closed",
"title": "[docs] fix broken chat_templating links in tasks docs",
"updated_at": "2026-02-23T16:27:57Z"
},
{
"additions": 716,
"author": "23atharvaS",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## Summary This PR migrates the `wav2vec2` family to the standardized output-capturing interface (`@capture_outputs` + `@can_return_tuple`) and includes follow-up compatibility fixes required to make full CI green. ## What changed ### Core\u2026",
"changed_files": 19,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44114",
"created_at": "2026-02-17T21:17:35Z",
"deletions": 1237,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44114/files",
"html_url": "https://github.com/huggingface/transformers/pull/44114",
"labels": [],
"merged": false,
"number": 44114,
"review_comments_count": 0,
"state": "open",
"title": "Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators",
"updated_at": "2026-02-18T20:34:53Z"
},
{
"additions": 5,
"author": "harshaljanjani",
"author_association": "CONTRIBUTOR",
"body_excerpt": "### What does this PR do? The following issue was identified and fixed in this PR: \u2192 Updates the stale `test_device_override` in `test_processing_granite_speech.py` to verify that the device param controls where speech inputs are placed, r\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44113",
"created_at": "2026-02-17T20:01:32Z",
"deletions": 7,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44113/files",
"html_url": "https://github.com/huggingface/transformers/pull/44113",
"labels": [],
"merged": true,
"number": 44113,
"review_comments_count": 2,
"state": "closed",
"title": "fix(testing): Update stale device override test in GraniteSpeech",
"updated_at": "2026-02-19T11:24:29Z"
},
{
"additions": 30,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Part of #43979 \u2014 refactors `poolformer` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `PoolFormerLayer` to return a single tensor instead of a 1-tuple - Simplifies `\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/44111",
"created_at": "2026-02-17T19:38:02Z",
"deletions": 59,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44111/files",
"html_url": "https://github.com/huggingface/transformers/pull/44111",
"labels": [],
"merged": false,
"number": 44111,
"review_comments_count": 0,
"state": "closed",
"title": "refactor(poolformer): use capture_outputs for output tracing",
"updated_at": "2026-02-18T21:19:22Z"
},
{
"additions": 28,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Part of #43979 \u2014 refactors `tvp` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `TvpAttention` to always return `(output, attention_probs)` (hooks decide what to capt\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44110",
"created_at": "2026-02-17T19:32:55Z",
"deletions": 101,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44110/files",
"html_url": "https://github.com/huggingface/transformers/pull/44110",
"labels": [],
"merged": false,
"number": 44110,
"review_comments_count": 0,
"state": "closed",
"title": "refactor(tvp): use capture_outputs for output tracing",
"updated_at": "2026-02-18T21:19:24Z"
},
{
"additions": 48,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Part of #43979 \u2014 refactors `hgnet_v2` to use the `capture_outputs` and `merge_with_config_defaults` decorators - Simplifies `HGNetV2Encoder` by removing `return_dict` parameter (always returns `BaseModelOutputWithNoAttention`)\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44109",
"created_at": "2026-02-17T19:23:03Z",
"deletions": 87,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44109/files",
"html_url": "https://github.com/huggingface/transformers/pull/44109",
"labels": [],
"merged": false,
"number": 44109,
"review_comments_count": 0,
"state": "closed",
"title": "refactor(hgnet_v2): use capture_outputs for output tracing",
"updated_at": "2026-02-18T21:19:25Z"
},
{
"additions": 33,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Adds `@merge_with_config_defaults` and `@capture_outputs` to both `VitDetModel` and `VitDetBackbone`, removing manual `output_attentions`/`return_dict` resolution - Adds `_can_record_outputs = {\"attentions\": VitDetAttention}`\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44108",
"created_at": "2026-02-17T19:15:00Z",
"deletions": 82,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44108/files",
"html_url": "https://github.com/huggingface/transformers/pull/44108",
"labels": [],
"merged": false,
"number": 44108,
"review_comments_count": 0,
"state": "closed",
"title": "refactor(vitdet): use output tracing decorators",
"updated_at": "2026-02-18T21:19:27Z"
},
{
"additions": 40,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Replaces manual `output_hidden_states`/`return_dict` resolution in `MraModel` with `@merge_with_config_defaults` and `@capture_outputs` decorators - Simplifies `MraEncoder` to a plain loop returning a single tensor, removing `\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44107",
"created_at": "2026-02-17T19:04:42Z",
"deletions": 112,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44107/files",
"html_url": "https://github.com/huggingface/transformers/pull/44107",
"labels": [],
"merged": false,
"number": 44107,
"review_comments_count": 0,
"state": "closed",
"title": "refactor(mra): use output tracing decorators",
"updated_at": "2026-02-18T21:19:29Z"
},
{
"additions": 47,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `YosoEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 5 wrapper model classes, eliminating manual `return_dict` handlin\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44106",
"created_at": "2026-02-17T18:59:25Z",
"deletions": 132,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44106/files",
"html_url": "https://github.com/huggingface/transformers/pull/44106",
"labels": [],
"merged": false,
"number": 44106,
"review_comments_count": 0,
"state": "closed",
"title": "Refactor yoso to use automatic output tracing",
"updated_at": "2026-02-18T21:19:30Z"
},
{
"additions": 39,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `LiltEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 3 wrapper model classes, eliminating manual `return_dict` handlin\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44105",
"created_at": "2026-02-17T18:54:40Z",
"deletions": 127,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44105/files",
"html_url": "https://github.com/huggingface/transformers/pull/44105",
"labels": [],
"merged": false,
"number": 44105,
"review_comments_count": 0,
"state": "closed",
"title": "Refactor lilt to use automatic output tracing",
"updated_at": "2026-02-18T21:19:32Z"
},
{
"additions": 66,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions`/`cross_attentions` collection in `MegatronBertEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 8 wrapper model classes, eliminating m\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44104",
"created_at": "2026-02-17T18:43:44Z",
"deletions": 207,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44104/files",
"html_url": "https://github.com/huggingface/transformers/pull/44104",
"labels": [],
"merged": false,
"number": 44104,
"review_comments_count": 0,
"state": "closed",
"title": "Refactor megatron_bert to use automatic output tracing",
"updated_at": "2026-02-18T21:19:34Z"
},
{
"additions": 53,
"author": "engmohamedsalah",
"author_association": "NONE",
"body_excerpt": "Fixes #44052 Now and then, the indexer ran into trouble switching between masks and cache. Most of the test failures came from these hiccups: - Indexer cache: the old if seq_len > 1: reset cache heuristic broke assisted decoding (multi-tok\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44103",
"created_at": "2026-02-17T18:04:48Z",
"deletions": 76,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44103/files",
"html_url": "https://github.com/huggingface/transformers/pull/44103",
"labels": [],
"merged": false,
"number": 44103,
"review_comments_count": 0,
"state": "closed",
"title": "Fix glm_moe_dsa",
"updated_at": "2026-02-18T19:38:11Z"
},
{
"additions": 42,
"author": "fumadari",
"author_association": "NONE",
"body_excerpt": "## Summary Refactors the `ibert` model to use the new `@capture_outputs` and `@can_return_tuple` decorators for output tracing, as part of the meta-issue #43979. **Key changes:** - Added `_can_record_outputs = {\"hidden_states\": IBertLayer,\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44102",
"created_at": "2026-02-17T17:21:32Z",
"deletions": 154,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44102/files",
"html_url": "https://github.com/huggingface/transformers/pull/44102",
"labels": [],
"merged": false,
"number": 44102,
"review_comments_count": 0,
"state": "closed",
"title": "Refactor ibert output tracing with capture_outputs",
"updated_at": "2026-02-18T21:19:35Z"
},
{
"additions": 210,
"author": "aman-coder03",
"author_association": "FIRST_TIME_CONTRIBUTOR",
"body_excerpt": "## What does this PR do? This PR refactors XLM's output tracing to align with the standardized output capturing patterns used across the codebase. ### Key changes: - Refactors transformer blocks into a dedicated `XLMLayer` module to enable\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/44101",
"created_at": "2026-02-17T17:15:06Z",
"deletions": 194,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44101/files",
"html_url": "https://github.com/huggingface/transformers/pull/44101",
"labels": [],
"merged": false,
"number": 44101,
"review_comments_count": 0,
"state": "open",
"title": "[XLM] Refactor output tracing to align with capture_outputs standardized architecture",
"updated_at": "2026-02-19T08:08:33Z"
},
{
"additions": 3,
"author": "qgallouedec",
"author_association": "MEMBER",
"body_excerpt": "In https://github.com/huggingface/trl/pull/5112 a user reported that `trl sft --help` fails It's because three inherited args from `TrainingArguments` (`torch_empty_cache_steps`, `gradient_checkpointing` and `use_liger_kernel`)help strings\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/44100",
"created_at": "2026-02-17T17:10:36Z",
"deletions": 3,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/44100/files",
"html_url": "https://github.com/huggingface/transformers/pull/44100",
"labels": [],
"merged": true,
"number": 44100,
"review_comments_count": 0,
"state": "closed",
"title": "Fix percentage formatting in help messages for gradient checkpointing, Liger Kernel, and empty cache steps",
"updated_at": "2026-02-20T09:57:51Z"
},
{
"additions": 2,
"author": "qgallouedec",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? ## Related Issue Fixes #40170 **Issue:** Add MXFP4 MoE/attention backward kernels **URL:** https://github.com/huggingface/transformers/issues/40170 ## Problem ## A Call To Action! The Hugg\u2026",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 7,
"conversation_url": "https://github.com/huggingface/transformers/pull/43771",
"created_at": "2026-02-05T15:12:21Z",
"deletions": 4,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43771/files",
"html_url": "https://github.com/huggingface/transformers/pull/43771",
"labels": [
"Code agent slop"
],
"merged": false,
"number": 43771,
"review_comments_count": 0,
"state": "closed",
"title": "fix: Add MXFP4 MoE/attention backward kernels",
"updated_at": "2026-03-24T14:14:44Z"
},
{
"additions": 47,
"author": "lordaarush",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? Removes the unconditional `self.state.train_batch_size = self._train_batch_size` assignment that was causing issues when resuming from checkpoint with different batch configurations. The `train_batch_size` should on\u2026",
"changed_files": 2,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 7,
"conversation_url": "https://github.com/huggingface/transformers/pull/43770",
"created_at": "2026-02-05T14:25:36Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43770/files",
"html_url": "https://github.com/huggingface/transformers/pull/43770",
"labels": [],
"merged": true,
"number": 43770,
"review_comments_count": 0,
"state": "closed",
"title": "Remove unconditional train_batch_size assignment",
"updated_at": "2026-02-06T14:47:16Z"
},
{
"additions": 3950,
"author": "eustlb",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Adds voxtral realtime! ## benchmarks Using [this reproducer](https://gist.github.com/eustlb/367f062f77a5971291fb5350763bea8d), I've ran WER evals on ami, librispeech and fleurs, with results Dataset | Original (vllm\u2026",
"changed_files": 21,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/43769",
"created_at": "2026-02-05T14:17:52Z",
"deletions": 2,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43769/files",
"html_url": "https://github.com/huggingface/transformers/pull/43769",
"labels": [
"New model",
"Audio"
],
"merged": true,
"number": 43769,
"review_comments_count": 39,
"state": "closed",
"title": "Add Voxtral Realtime",
"updated_at": "2026-02-26T10:18:32Z"
},
{
"additions": 87,
"author": "zucchini-nlp",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Helps vLLM to bump to v5",
"changed_files": 6,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 5,
"conversation_url": "https://github.com/huggingface/transformers/pull/43768",
"created_at": "2026-02-05T14:04:02Z",
"deletions": 5,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43768/files",
"html_url": "https://github.com/huggingface/transformers/pull/43768",
"labels": [],
"merged": true,
"number": 43768,
"review_comments_count": 10,
"state": "closed",
"title": "Fix init weights in remote code",
"updated_at": "2026-02-17T14:45:18Z"
},
{
"additions": 850,
"author": "XingweiDeng",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? src/transformers/utils/import_utils.py:2317:16\u2026",
"changed_files": 0,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/43709",
"created_at": "2026-02-03T14:26:58Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43709/files",
"html_url": "https://github.com/huggingface/transformers/pull/43709",
"labels": [],
"merged": true,
"number": 43709,
"review_comments_count": 0,
"state": "closed",
"title": "fix: `VersionComparison.from_string` return type mismatch",
"updated_at": "2026-02-23T19:05:33Z"
},
{
"additions": 2202,
"author": "liu-jiaxuan",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingfa\u2026",
"changed_files": 16,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 11,
"conversation_url": "https://github.com/huggingface/transformers/pull/43707",
"created_at": "2026-02-03T13:33:41Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43707/files",
"html_url": "https://github.com/huggingface/transformers/pull/43707",
"labels": [
"New model"
],
"merged": true,
"number": 43707,
"review_comments_count": 145,
"state": "closed",
"title": "[Model] Add SLANeXt Model Support",
"updated_at": "2026-03-20T17:24:22Z"
},
{
"additions": 42,
"author": "vasqu",
"author_association": "MEMBER",
"body_excerpt": "As per title, the new way to call the attention interface has slipped through a refactor because it's too new and not too well known atm cc @yonigozlan",
"changed_files": 9,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/43706",
"created_at": "2026-02-03T11:57:22Z",
"deletions": 48,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43706/files",
"html_url": "https://github.com/huggingface/transformers/pull/43706",
"labels": [],
"merged": true,
"number": 43706,
"review_comments_count": 2,
"state": "closed",
"title": "[`Attn`] Fixup interface usage after refactor",
"updated_at": "2026-02-03T14:56:35Z"
},
{
"additions": 120,
"author": "Cyrilvallez",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Allow the `is_causal` kwarg and config attribute to make well-behaved decoder-only models act as encoders",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 1,
"conversation_url": "https://github.com/huggingface/transformers/pull/43705",
"created_at": "2026-02-03T11:45:43Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43705/files",
"html_url": "https://github.com/huggingface/transformers/pull/43705",
"labels": [],
"merged": true,
"number": 43705,
"review_comments_count": 11,
"state": "closed",
"title": "Allow bi-directional attention for all models",
"updated_at": "2026-02-04T17:24:32Z"
},
{
"additions": 1,
"author": "francesco-bertolotti",
"author_association": "CONTRIBUTOR",
"body_excerpt": "wrong `rms_norm_type` # What does this PR do? Small type error in the configuration of qwen3. `rms_norm_eps` should be a float and not an int. ## Before submitting - [ X] This PR fixes a typo or improves the docs (you can dismiss the other\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/43703",
"created_at": "2026-02-03T10:05:17Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43703/files",
"html_url": "https://github.com/huggingface/transformers/pull/43703",
"labels": [],
"merged": true,
"number": 43703,
"review_comments_count": 0,
"state": "closed",
"title": "Update configuration_qwen3.py",
"updated_at": "2026-02-04T07:03:04Z"
},
{
"additions": 2828,
"author": "eustlb",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? Adds[ UsefulSensors'](https://huggingface.co/UsefulSensors) new ASR model.",
"changed_files": 19,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/43702",
"created_at": "2026-02-03T09:32:42Z",
"deletions": 247,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43702/files",
"html_url": "https://github.com/huggingface/transformers/pull/43702",
"labels": [
"New model"
],
"merged": true,
"number": 43702,
"review_comments_count": 30,
"state": "closed",
"title": "Add moonshine streaming",
"updated_at": "2026-02-12T10:10:16Z"
},
{
"additions": 1,
"author": "YangKai0616",
"author_association": "CONTRIBUTOR",
"body_excerpt": "Here pytorch has a mature mechanism to auto select the right backend for different devices. @ydshieh pls help review, thx!",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 6,
"conversation_url": "https://github.com/huggingface/transformers/pull/43699",
"created_at": "2026-02-03T07:33:04Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43699/files",
"html_url": "https://github.com/huggingface/transformers/pull/43699",
"labels": [],
"merged": false,
"number": 43699,
"review_comments_count": 3,
"state": "closed",
"title": "avoid using specified backend for tp tests",
"updated_at": "2026-03-09T08:17:48Z"
},
{
"additions": 1,
"author": "sywangyi",
"author_association": "CONTRIBUTOR",
"body_excerpt": "- model loading (from pretrained, etc): @CyrilVallez - distributed: @3outeille @ArthurZucker fix tp crash. crash stack is [rank0]: Traceback (most recent call last): [rank0]: File \"/transformers/benchmark_v2/test_tp.py\", line 29, in - Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 ```bash input = { \"messages\": [ { \"role\": \"user\", \"content\": [ { \"type\": \"text\", \"text\": \"The history of France is \", } ], }, ], } I have a question about th\u2026",
"changed_files": 1,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/43670",
"created_at": "2026-02-02T02:06:14Z",
"deletions": 1,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43670/files",
"html_url": "https://github.com/huggingface/transformers/pull/43670",
"labels": [],
"merged": true,
"number": 43670,
"review_comments_count": 0,
"state": "closed",
"title": "Fix FP8Expert for Qwen",
"updated_at": "2026-02-02T15:18:49Z"
},
{
"additions": 2,
"author": "fschlatt",
"author_association": "CONTRIBUTOR",
"body_excerpt": "# What does this PR do? makes the whole mixin behave like a static holder for methods... - Modify methods/inherited cl\u2026",
"changed_files": 137,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/43620",
"created_at": "2026-01-30T11:24:09Z",
"deletions": 288,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43620/files",
"html_url": "https://github.com/huggingface/transformers/pull/43620",
"labels": [],
"merged": true,
"number": 43620,
"review_comments_count": 0,
"state": "closed",
"title": "[`Rope`] Revert #43410 and make inheritance implicit again",
"updated_at": "2026-01-30T18:44:16Z"
},
{
"additions": 40,
"author": "zucchini-nlp",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do? As per title, some models add or delete entries in tied weights depending on configuration. If we load two models consecutively with different configs, it fails to tie weights correctly I am copying it in `__init__`\u2026",
"changed_files": 4,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/43619",
"created_at": "2026-01-30T10:43:38Z",
"deletions": 6,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43619/files",
"html_url": "https://github.com/huggingface/transformers/pull/43619",
"labels": [
"for patch"
],
"merged": true,
"number": 43619,
"review_comments_count": 8,
"state": "closed",
"title": "Don't modify `tied_weight_keys` in-place",
"updated_at": "2026-01-30T15:46:02Z"
},
{
"additions": 17,
"author": "kaixuanliu",
"author_association": "CONTRIBUTOR",
"body_excerpt": "@zucchini-nlp pls help review, thx! We have to add back the changes in https://github.com/huggingface/transformers/pull/42523. As for llava_onevision model, in its checkpoint config file, the model's `tie_word_embeddings` is Flase, and mod\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 3,
"conversation_url": "https://github.com/huggingface/transformers/pull/43617",
"created_at": "2026-01-30T10:21:45Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43617/files",
"html_url": "https://github.com/huggingface/transformers/pull/43617",
"labels": [],
"merged": false,
"number": 43617,
"review_comments_count": 0,
"state": "closed",
"title": "Fix tie_word_embedding issue for llava_onevision model",
"updated_at": "2026-01-30T14:33:39Z"
},
{
"additions": 3,
"author": "yiliu30",
"author_association": "CONTRIBUTOR",
"body_excerpt": "Signed-off-by: yiliu30 # What does this PR do? ## Related Issue Fixes #43408 **Issue:** Warning: You are using a model of type sam3_video to instantiate a model of type sam3_tracker **URL:** https://github.com/huggingface/transformers/\u2026",
"changed_files": 8,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 12,
"conversation_url": "https://github.com/huggingface/transformers/pull/43495",
"created_at": "2026-01-26T12:46:21Z",
"deletions": 7,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43495/files",
"html_url": "https://github.com/huggingface/transformers/pull/43495",
"labels": [],
"merged": true,
"number": 43495,
"review_comments_count": 4,
"state": "closed",
"title": "fix: add compatible_model_types to suppress model type mismatch warnings",
"updated_at": "2026-02-05T13:31:24Z"
},
{
"additions": 20,
"author": "githubnemo",
"author_association": "MEMBER",
"body_excerpt": "The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception: ``` Traceback (most recent call last): File \".../exps/tr\u2026",
"changed_files": 3,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 4,
"conversation_url": "https://github.com/huggingface/transformers/pull/43494",
"created_at": "2026-01-26T11:34:05Z",
"deletions": 0,
"draft": false,
"files_url": "https://github.com/huggingface/transformers/pull/43494/files",
"html_url": "https://github.com/huggingface/transformers/pull/43494",
"labels": [],
"merged": true,
"number": 43494,
"review_comments_count": 1,
"state": "closed",
"title": "Fix loading of Qwen3 FP8",
"updated_at": "2026-01-27T09:56:23Z"
},
{
"additions": 54,
"author": "eustlb",
"author_association": "MEMBER",
"body_excerpt": "# What does this PR do?",
"changed_files": 5,
"cluster_id": null,
"cluster_ids": [],
"cluster_role": null,
"comments_count": 2,
"conversation_url": "https://github.com/huggingface/transformers/pull/43492",
"created_at": "2026-01-26T10:30:53Z",
"deletions": 1,
"draft": true,
"files_url": "https://github.com/huggingface/transformers/pull/43492/files",
"html_url": "https://github.com/huggingface/transformers/pull/43492",
"labels": [],
"merged": false,
"number": 43492,
"review_comments_count": 0,
"state": "open",
"title": "Perception Encoder follow up PR",
"updated_at": "2026-01-26T12:55:35Z"
},
{
"additions": 605,
"author": "tarekziade",
"author_association": "MEMBER",
"body_excerpt": "DRAFT FOR DISCUSSION # What does this PR do?