Update the llama_bidirectional_model.py docstring for improved clarity

Signed-off-by: Oliver Holworthy <nvidia-oliver-holworthy@users.noreply.huggingface.co>

Files changed (1) hide show

llama_bidirectional_model.py CHANGED Viewed

@@ -1,27 +1,29 @@
 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0.
 """
-Bidirectional Llama model for embedding tasks.
-This module provides a modified LlamaModel that uses bidirectional (non-causal)
-attention, suitable for generating embeddings where each token should attend
-to all other tokens in the sequence.
-Supports transformers version 4.44 and above with a unified forward() implementation.
-Version compatibility notes:
-    - transformers 4.47: Setting _attn_implementation in __init__ had no effect due to
-      attention initialization order
-    - transformers 4.48+: Attention refactor (transformers#35235) activated the
-      _attn_implementation setting, which defaulted to "eager" instead of "sdpa"
-    - transformers < 4.53: LlamaModel has _update_causal_mask method that can be overridden
-    - transformers 4.53+: _update_causal_mask removed; masking moved to masking_utils module,
-      necessitating a full forward() override for custom attention masks
-    - transformers < 4.54: Decoder layer returns tuple, uses past_key_value (singular)
-    - transformers 4.54-4.55: Decoder layer returns tensor, uses past_key_value (singular)
-    - transformers 4.56+: Decoder layer returns tensor, uses past_key_values (plural),
-      DynamicCache accepts config parameter
-    - transformers 5.0+: Has native create_bidirectional_mask in masking_utils
 """
 import inspect

 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0.
 """
+Bidirectional Llama model for cross-encoder reranking.
+Modifies LlamaModel to use bidirectional (non-causal) attention so each token
+attends to all others — required for cross-encoder scoring of query-document pairs.
+Provides three classes:
+    - LlamaBidirectionalConfig: Adds pooling and temperature to LlamaConfig.
+    - LlamaBidirectionalModel: LlamaModel with causal masking replaced by
+      bidirectional masking. Overrides forward() to support transformers >=4.44.
+    - LlamaBidirectionalForSequenceClassification: Pools hidden states and
+      projects to a relevance score via a linear head.
+Transformers version compatibility (>=4.44 including 5.0+):
+    The forward() implementation handles these API changes at import time via
+    inspect.signature() on LlamaDecoderLayer and DynamicCache:
+    < 4.53:  _update_causal_mask exists on LlamaModel (not used here).
+    4.53+:   Masking moved to masking_utils; requires full forward() override.
+    < 4.54:  Decoder layer returns a tuple.
+    4.54+:   Decoder layer returns a tensor.
+    < 4.56:  Cache kwarg is ``past_key_value`` (singular).
+    4.56+:   Cache kwarg is ``past_key_values`` (plural); DynamicCache accepts config.
+    5.0+:    Native ``create_bidirectional_mask`` in masking_utils.
 """
 import inspect