pytagi.nn.attention#
Classes#
Implements a Multi-head Attention layer with uncertainty quantification. |
Module Contents#
- class pytagi.nn.attention.MultiheadAttention(embed_dim: int, num_heads: int, num_kv_heads: int = None, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'Xavier')[source]#
Bases:
pytagi.nn.base_layer.BaseLayerImplements a Multi-head Attention layer with uncertainty quantification. This layer applies scaled dot-product attention with multiple attention heads, allowing the model to jointly attend to information from different representation subspaces. It inherits from BaseLayer.
Initializes the MultiheadAttention layer.
- Parameters:
embed_dim – The dimensionality of the input embeddings and output.
num_heads – The number of attention heads.
num_kv_heads – The number of key-value heads for grouped-query attention. If None, defaults to num_heads (standard multi-head attention).
bias – If True, additive bias is included in the linear projections. Defaults to True.
gain_weight – Scaling factor applied to initialized weights. Defaults to 1.0.
gain_bias – Scaling factor applied to initialized biases. Defaults to 1.0.
init_method – The method used for initializing weights and biases (e.g., “Xavier”, “He”). Defaults to “Xavier”.