pytagi.nn.attention#

Classes#

MultiheadAttention

Implements a Multi-head Attention layer with uncertainty quantification.

Module Contents#

class pytagi.nn.attention.MultiheadAttention(embed_dim: int, num_heads: int, num_kv_heads: int = None, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'Xavier')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Implements a Multi-head Attention layer with uncertainty quantification. This layer applies scaled dot-product attention with multiple attention heads, allowing the model to jointly attend to information from different representation subspaces. It inherits from BaseLayer.

Initializes the MultiheadAttention layer.

Parameters:

embed_dim – The dimensionality of the input embeddings and output.
num_heads – The number of attention heads.
num_kv_heads – The number of key-value heads for grouped-query attention. If None, defaults to num_heads (standard multi-head attention).
bias – If True, additive bias is included in the linear projections. Defaults to True.
gain_weight – Scaling factor applied to initialized weights. Defaults to 1.0.
gain_bias – Scaling factor applied to initialized biases. Defaults to 1.0.
init_method – The method used for initializing weights and biases (e.g., “Xavier”, “He”). Defaults to “Xavier”.

get_layer_info() → str[source]#: Retrieves a descriptive string containing information about the layer’s configuration from the C++ backend.

get_layer_name() → str[source]#: Retrieves the name of the layer from the C++ backend.

init_weight_bias()[source]#: Initializes the layer’s parameters for query, key, and value projections using the specified initialization method and gain factors. This task is delegated to the C++ backend.