pytagi.nn

pytagi.nn#

Neural Network module for pyTAGI.

This module provides various neural network layers and components, including activation functions, base layers, convolutional layers, recurrent layers, and utility modules. These components are designed to work with probabilistic data structures and leverage a C++ backend for performance.

Submodules#

Classes#

`ClosedFormSoftmax`	Applies a probabilistic Softmax approximation function.
`EvenExp`	Applies the EvenExp activation function.
`LeakyReLU`	Applies the Leaky Rectified Linear Unit function element-wise.
`MixtureReLU`	Applies a probabilistic Rectified Linear Unit approximation.
`MixtureSigmoid`	Applies a probabilistic picewise-linear Sigmoid-like function.
`MixtureTanh`	Applies a probabilistic piecewise-linear Hyperbolic Tangent function.
`ReLU`	Applies the Rectified Linear Unit function.
`Remax`	Applies a probabilistic Remax approximation function.
`Sigmoid`	Applies the Sigmoid function element-wise.
`Softmax`	Applies a Local-Linearization of the Softmax function to an n-dimensional input.
`Softplus`	Applies the Softplus function element-wise.
`Tanh`	Applies the Hyperbolic Tangent function.
`BaseLayer`	Base layer class providing common functionality and properties for neural network layers.
`BatchNorm2d`	Applies 2D Batch Normalization.
`Conv2d`	Applies a 2D convolution operation.
`ConvTranspose2d`	Applies a 2D transposed convolution operation (also known as deconvolution).
`BaseDeltaStates`	Represents the base delta states, acting as a Python wrapper for the C++ backend.
`BaseHiddenStates`	Represents the base hidden states, acting as a Python wrapper for the C++ backend.
`HRCSoftmax`	Hierarchical softmax wrapper from the CPP backend.
`DDPConfig`	Configuration for Distributed Data Parallel (DDP) training.
`DDPSequential`	A wrapper for Sequential models to enable Distributed Data Parallel (DDP) training.
`Embedding`	Embedding layer
`LayerBlock`	A stack of different layers derived from BaseLayer
`LayerNorm`	Implements Layer Normalization by normalizing the inputs across the
`Linear`	Implements a Fully-connected layer, also known as a dense layer.
`LSTM`	A Long Short-Term Memory (LSTM) layer for RNNs. It inherits from BaseLayer.
`OutputUpdater`	A utility to compute the error signal (delta states) for the output layer.
`AvgPool2d`	2D Average Pooling Layer.
`MaxPool2d`	2D Max Pooling Layer.
`ResNetBlock`	A Residual Network (ResNet) block structure.
`Sequential`	A sequential container for layers.
`SLinear`	Smoother Linear layer for the SLSTM architecture.
`SLSTM`	Smoothing Long Short-Term Memory (LSTM) layer.

Package Contents#

class pytagi.nn.ClosedFormSoftmax[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a probabilistic Softmax approximation function.

Closed-form softmax is an approximation of the deterministic softmax function that provides a closed-form solution for the output moments of Gaussian inputs. It is commonly used as the final activation function in a classification network to produce probability distributions over classes.

\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.EvenExp[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the EvenExp activation function.

This function allows passing only the odd postions of the output layer through an exponential activation function. This is used for going from V2_bar to V2_bar_tilde for the aleatoric uncertainty inference in the case of heteroscedastic regression.

\[\begin{split}\text{EvenExp}(x) = \begin{cases} \exp(x) & \text{if } x \text{ is at an odd position}\\ x & \text{if } x \text{ is at an even position} \end{cases}\end{split}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.LeakyReLU[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the Leaky Rectified Linear Unit function element-wise.

This is a variant of ReLU that allows a small, non-zero gradient when the unit is not active. This layer relies on a first-order Taylor-series approximation where the activation function is locally linearized at the input expected value.

\[\begin{split}\text{LeakyReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha x & \text{ otherwise } \end{cases}\end{split}\]

Where $\alpha$ is the negative_slope and is set to 0.1.

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.MixtureReLU[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a probabilistic Rectified Linear Unit approximation.

This layer processes an input Gaussian distribution and outputs the moments for a rectified linear unit. This layer relies on exact moment calculations.

For an input random variable $X \sim \mathcal{N}(\mu, \sigma^2)$, the output $Y = \max(0, X)$ results in a rectified Gaussian.

../_static/activation_io_mixture_relu.png

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.MixtureSigmoid[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a probabilistic picewise-linear Sigmoid-like function.

This layer processes an input Gaussian distribution and outputs the moments for a picewise-linear Sigmoid-like function. This layer relies on exact moment calculations.

../_static/activation_io_mixture_sigmoid.png

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.MixtureTanh[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a probabilistic piecewise-linear Hyperbolic Tangent function.

This layer processes an input Gaussian distribution and outputs the moments for a picewise-linear Tanh-like function. This layer relies on exact moment calculations.

../_static/activation_io_mixture_tanh.png

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.ReLU[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the Rectified Linear Unit function.

This layer processes an input Gaussian distribution and outputs the moments for a rectified linear unit. This layer relies on a first-order Taylor-series approximation where the activation function is locally linearized at the input expected value.

\[\text{ReLU}(x) = (x)^+ = \max(0, x)\]

../_static/relu_simplified_gaussian_io.png

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.Remax[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a probabilistic Remax approximation function.

Remax is a softmax-like activation function which replaces the exponential function by a mixtureRelu. It rescales the input so that the elements of the output lie in the range [0,1] and sum to 1. It is commonly used as the final activation function in a classification network to produce probability distributions over classes.

\[\text{Remax}(x_{i}) = \frac{\text{ReLU}(x_i)}{\sum_j \text{ReLU}(x_j)}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.Sigmoid[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the Sigmoid function element-wise.

This layer approximates the moments after applying the sigmoid function whose values are constrained to the range (0, 1). This layer relies on a first-order Taylor-series approximation where the activation function is locally linearized at the input expected value.

\[\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + e^{-x}}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.Softmax[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a Local-Linearization of the Softmax function to an n-dimensional input.

The Softmax function rescales the input so that the elements of the output lie in the range [0,1] and sum to 1. It is commonly used as the final activation function in a classification network to produce probability distributions over classes.

\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.Softplus[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the Softplus function element-wise.

Softplus is a smooth approximation of the ReLU function. This layer relies on a first-order Taylor-series approximation where the activation function is locally linearized at the input expected value.

\[\text{Softplus}(x) = \log(1 + e^{x})\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.Tanh[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies the Hyperbolic Tangent function.

This layer approximates the moments after applying the Tanh function whose values are constrained to the range (-1, 1). This layer relies on a first-order Taylor-series approximation where the activation function is locally linearized at the input expected value.

\[\text{Tanh}(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]

Initializes the BaseLayer with a C++ backend instance.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

class pytagi.nn.BaseLayer[source]#

Base layer class providing common functionality and properties for neural network layers. This class acts as a Python wrapper for the C++ backend, exposing layer attributes and methods for managing layer information, device placement, and parameters.

Initializes the BaseLayer with a C++ backend instance.

to_cuda()[source]#: Moves the layer’s parameters and computations to the CUDA device.

get_layer_info() → str[source]#

Retrieves detailed information about the layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the layer.

Returns:: The name of the layer.
Return type:: str

get_max_num_states() → int[source]#

Retrieves the maximum number of states the layer can hold.

Returns:: The maximum number of states.
Return type:: int

property input_size: int#: Gets the input size of the layer.

property output_size: int#: Gets the output size of the layer.

property in_width: int#: Gets the input width of the layer (for convolutional layers).

property in_height: int#: Gets the input height of the layer (for convolutional layers).

property in_channels: int#: Gets the input channels of the layer (for convolutional layers).

property out_width: int#: Gets the output width of the layer (for convolutional layers).

property out_height: int#: Gets the output height of the layer (for convolutional layers).

property out_channels: int#: Gets the output channels of the layer (for convolutional layers).

property bias: bool#: Gets a boolean indicating whether the layer has a bias term.

property num_weights: int#: Gets the total number of weights in the layer.

property num_biases: int#: Gets the total number of biases in the layer.

property mu_w: numpy.ndarray#: Gets the mean of the weights (mu_w) as a NumPy array.

property var_w: numpy.ndarray#: Gets the variance of the weights (var_w) as a NumPy array.

property mu_b: numpy.ndarray#: Gets the mean of the biases (mu_b) as a NumPy array.

property var_b: numpy.ndarray#: Gets the variance of the biases (var_b) as a NumPy array.

property delta_mu_w: numpy.ndarray#: Gets the delta mean of the weights (delta_mu_w) as a NumPy array.

property delta_var_w: numpy.ndarray#: Gets the delta variance of the weights (delta_var_w) as a NumPy array. The delta corresponds to the amount of change induced by the update step.

property delta_mu_b: numpy.ndarray#: Gets the delta mean of the biases (delta_mu_b) as a NumPy array. This delta corresponds to the amount of change induced by the update step.

property delta_var_b: numpy.ndarray#: Gets the delta variance of the biases (delta_var_b) as a NumPy array. This delta corresponds to the amount of change induced by the update step.

property num_threads: int#: Gets the number of threads to use for computations.

property training: bool#: Gets a boolean indicating whether the layer is in training mode.

property device: bool#: Gets a boolean indicating whether the layer is on the GPU (‘cuda’) or CPU (‘cpu’).

class pytagi.nn.BatchNorm2d(num_features: int, eps: float = 1e-05, momentum: float = 0.9, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies 2D Batch Normalization.

Batch Normalization normalizes the inputs of a layer by re-centering and re-scaling them.

Parameters:

num_features (int) – The number of features in the input tensor.
eps (float) – A small value added to the variance to avoid division by zero. Defaults to 1e-5.
momentum (float) – The momentum for the running mean and variance. Defaults to 0.9.
bias (bool) – Whether to include a learnable bias term. Defaults to True.
gain_weight (float) – Initial value for the gain (scale) parameter. Defaults to 1.0.
gain_bias (float) – Initial value for the bias (shift) parameter. Defaults to 1.0.

Initializes the BatchNorm2d layer.

get_layer_info() → str[source]#

Retrieves detailed information about the BatchNorm2d layer.

Returns:

A string containing the layer’s information, typically delegated: to the C++ backend implementation.

Return type:

str

get_layer_name() → str[source]#

Retrieves the name of the BatchNorm2d layer.

Returns:: The name of the layer, typically delegated to the C++ backend implementation.
Return type:: str

init_weight_bias()[source]#: Initializes the learnable weight (scale/gain) and bias (shift/offset) parameters of the batch normalization layer. This operation is delegated to the C++ backend.

class pytagi.nn.Conv2d(in_channels: int, out_channels: int, kernel_size: int, bias: bool = True, stride: int = 1, padding: int = 0, padding_type: int = 1, in_width: int = 0, in_height: int = 0, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a 2D convolution operation.

This layer performs a convolution operation, which is a fundamental building block in convolutional neural networks (CNNs). It slides a kernel (or filter) over an input tensor to produce an output tensor.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
kernel_size (int) – Size of the convolutional kernel.
bias (bool) – Whether to include a learnable bias term. Defaults to True.
stride (int) – The step size of the kernel. Defaults to 1.
padding (int) – Amount of zero-padding added to the input. Defaults to 0.
padding_type (int) – Type of padding. Defaults to 1 (likely ‘zeros’ or similar).
in_width (int) – Input width. If 0, it might be inferred or set by the backend. Defaults to 0.
in_height (int) – Input height. If 0, it might be inferred or set by the backend. Defaults to 0.
gain_weight (float) – Initial value for the gain (scale) parameter of weights. Defaults to 1.0.
gain_bias (float) – Initial value for the gain (scale) parameter of biases. Defaults to 1.0.
init_method (str) – Method used for initializing weights. Defaults to “He”.

Initializes the Conv2d layer.

get_layer_info() → str[source]#

Retrieves detailed information about the Conv2d layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the Conv2d layer.

Returns:: The name of the layer.
Return type:: str

init_weight_bias()[source]#: Initializes the learnable weight (kernel) and bias parameters of the convolutional layer. This initialization is delegated to the C++ backend using the ‘init_method’ specified (e.g., “He”).

class pytagi.nn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: int, bias: bool = True, stride: int = 1, padding: int = 0, padding_type: int = 1, in_width: int = 0, in_height: int = 0, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Applies a 2D transposed convolution operation (also known as deconvolution).

This layer performs a transposed convolution, which is used in tasks like image generation or segmentation to upsample feature maps. It reverses the convolution operation, increasing the spatial dimensions of the input.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
kernel_size (int) – Size of the convolutional kernel.
bias (bool) – Whether to include a learnable bias term. Defaults to True.
stride (int) – The step size of the kernel. Defaults to 1.
padding (int) – Amount of zero-padding added to the input. Defaults to 0.
padding_type (int) – Type of padding. Defaults to 1 (likely ‘zeros’ or similar).
in_width (int) – Input width. If 0, it might be inferred or set by the backend. Defaults to 0.
in_height (int) – Input height. If 0, it might be inferred or set by the backend. Defaults to 0.
gain_weight (float) – Initial value for the gain (scale) parameter of weights. Defaults to 1.0.
gain_bias (float) – Initial value for the gain (scale) parameter of biases. Defaults to 1.0.
init_method (str) – Method used for initializing weights. Defaults to “He”.

Initializes the ConvTranspose2d layer.

get_layer_info() → str[source]#

Retrieves detailed information about the ConvTranspose2d layer.

Returns:: A string containing the layer’s information.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the ConvTranspose2d layer.

Returns:: The name of the layer.
Return type:: str

init_weight_bias()[source]#: Initializes the learnable weight and bias parameters of the transposed convolutional layer.

class pytagi.nn.BaseDeltaStates(size: int | None = None, block_size: int | None = None)[source]#

Represents the base delta states, acting as a Python wrapper for the C++ backend. This class manages the change in mean (delta_mu) and change in variance (delta_var) induced by the update step.

Initializes the BaseDeltaStates.

Parameters:

size (Optional[int]) – The size of the delta states.
block_size (Optional[int]) – The block size for the delta states.

property delta_mu: List[float]#: Gets or sets the change in mean of the delta states (delta_mu).

property delta_var: List[float]#: Gets or sets the change in variance of the delta states (delta_var).

property size: int#: Gets the size of the delta states.

property block_size: int#: Gets the block size of the delta states.

property actual_size: int#: Gets the actual size of the delta states.

get_name() → str[source]#

Gets the name of the delta states type.

Returns:: The name of the delta states type.
Return type:: str

reset_zeros() → None[source]#: Reset all delta_mu and delta_var to zeros.

copy_from(source: BaseDeltaStates, num_data: int = -1) → None[source]#

Copy values of delta_mu and delta_var from another delta states object.

Parameters:

source (BaseDeltaStates) – The source delta states object to copy from.
num_data (int) – The number of data points to copy. Defaults to -1 (all).

set_size(new_size: int, new_block_size: int) → str[source]#

Sets a new size and block size for the delta states.

Parameters:

new_size (int) – The new size.
new_block_size (int) – The new block size.

Returns:

A message indicating the success or failure of the operation.

Return type:

str

class pytagi.nn.BaseHiddenStates(size: int | None = None, block_size: int | None = None)[source]#

Represents the base hidden states, acting as a Python wrapper for the C++ backend. This class manages the mean (mu_a), variance (var_a), and Jacobian (jcb) of hidden states.

Initializes the BaseHiddenStates.

Parameters:

size (Optional[int]) – The size of the hidden states.
block_size (Optional[int]) – The block size for the hidden states.

property mu_a: List[float]#: Gets or sets the mean of the hidden states (mu_a).

property var_a: List[float]#: Gets or sets the variance of the hidden states (var_a).

property jcb: List[float]#: Gets or sets the Jacobian of the hidden states (jcb).

property size: int#: Gets the size of the hidden states.

property block_size: int#: Gets the block size of the hidden states.

property actual_size: int#: Gets the actual size of the hidden states.

set_input_x(mu_x: List[float], var_x: List[float], block_size: int)[source]#

Sets the input for the hidden states.

Parameters:

mu_x (List[float]) – The mean of the input x.
var_x (List[float]) – The variance of the input x.
block_size (int) – The block size for the input.

get_name() → str[source]#

Gets the name of the hidden states type.

Returns:: The name of the hidden states type.
Return type:: str

set_size(new_size: int, new_block_size: int) → str[source]#

Sets a new size and block size for the hidden states.

Parameters:

new_size (int) – The new size.
new_block_size (int) – The new block size.

Returns:

A message indicating the success or failure of the operation.

Return type:

str

class pytagi.nn.HRCSoftmax[source]#

Hierarchical softmax wrapper from the CPP backend.

Initializes the HRCSoftmax object.

property obs: List[float]#: Gets or sets the fictive observation in [-1, 1].

property idx: List[int]#: Gets or sets the indices assigned to each label.

property num_obs: int#: Gets or sets the number of indices for each label.

property len: int#: Gets or sets the length of an observation (e.g., 10 labels -> len(obs) = 11).

class pytagi.nn.DDPConfig(device_ids: List[int], backend: str = 'nccl', rank: int = 0, world_size: int = 1)[source]#

Configuration for Distributed Data Parallel (DDP) training.

This class holds all the necessary settings for initializing a distributed process group.

Initializes the DDP configuration.

Parameters:

device_ids (List[int]) – A list of GPU device IDs to be used for training.
backend (str, optional) – The distributed backend to use. ‘nccl’ is recommended for GPUs. Defaults to “nccl”.
rank (int, optional) – The unique rank of the current process. Defaults to 0.
world_size (int, optional) – The total number of processes participating in the training. Defaults to 1.

property device_ids: List[int]#: The list of GPU device IDs.

property backend: str#: The distributed communication backend (e.g., ‘nccl’).

property rank: int#: The rank of the current process in the distributed group.

property world_size: int#: The total number of processes in the distributed group.

class pytagi.nn.DDPSequential(model: pytagi.nn.sequential.Sequential, config: DDPConfig, average: bool = True)[source]#

A wrapper for Sequential models to enable Distributed Data Parallel (DDP) training.

This class handles gradient synchronization and parameter updates across multiple processes, allowing for scalable training on multiple GPUs.

Initializes the DDPSequential wrapper.

Parameters:

model (Sequential) – The Sequential model to be parallelized.
config (DDPConfig) – The DDP configuration object.
average (bool, optional) – If True, gradients are averaged across processes. If False, they are summed. Defaults to True.

property output_z_buffer: pytagi.nn.data_struct.BaseHiddenStates#: The output hidden states buffer from the forward pass of the underlying model.

property input_delta_z_buffer: pytagi.nn.data_struct.BaseDeltaStates#: The input delta states buffer for the backward pass of the underlying model.

__call__(mu_x: numpy.ndarray, var_x: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray][source]#

A convenient alias for the forward pass.

Parameters:

mu_x (np.ndarray) – The mean of the input data for the current process.
var_x (np.ndarray, optional) – The variance of the input data for the current process. Defaults to None.

Returns:

A tuple containing the mean and variance of the model’s output.

Return type:

Tuple[np.ndarray, np.ndarray]

forward(mu_x: numpy.ndarray, var_x: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray][source]#

Performs a forward pass on the local model replica.

Parameters:

mu_x (np.ndarray) – The mean of the input data.
var_x (np.ndarray, optional) – The variance of the input data. Defaults to None.

Returns:

A tuple containing the mean and variance of the output.

Return type:

Tuple[np.ndarray, np.ndarray]

backward()[source]#: Performs a backward pass and synchronizes gradients across all processes.

step()[source]#: Performs a single parameter update step based on the synchronized gradients.

train()[source]#: Sets the model to training mode.

eval()[source]#: Sets the model to evaluation mode.

barrier()[source]#

Synchronizes all processes.

Blocks until all processes in the distributed group have reached this point.

get_outputs() → Tuple[numpy.ndarray, numpy.ndarray][source]#

Gets the outputs from the last forward pass on the local replica.

Returns:: A tuple containing the mean and variance of the output.
Return type:: Tuple[np.ndarray, np.ndarray]

output_to_host()[source]#: Copies the output data from the device to the host (CPU memory).

get_device_with_index() → str[source]#

Gets the device string for the current process, including its index.

Returns:: The device string, e.g., ‘cuda:0’.
Return type:: str

class pytagi.nn.Embedding(num_embeddings: int, embedding_dim: int, input_size: int = 0, scale: float = 1.0, padding_idx: int = -1)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Embedding layer

The embedding layer maps discrete categorical indices to continuous vector representations.

Parameters:

num_embeddings (int) – The size of the vocabulary (the total number of possible indices).
embedding_dim (int) – The dimensionality of the embedding vectors.
input_size (int) – The size of the input sequence. Defaults to 0.
scale (float) – A scaling factor applied to the embedding vectors. Defaults to 1.0.
padding_idx (int) – If specified, the embedding vector at this index is initialized to zeros and is not updated during training. Defaults to -1 (disabled).

Initializes the Embedding layer.

get_layer_info() → str[source]#

Retrieves detailed information about the Embedding layer.

Returns:: A string containing the layer’s configuration.
Return type:: str

get_layer_name() → str[source]#

Retrieves the name of the Embedding layer.

Returns:: The name of the layer.
Return type:: str

init_weight_bias()[source]#: Initializes the embedding matrix (the learnable weights of the layer).

class pytagi.nn.LayerBlock(*layers: pytagi.nn.base_layer.BaseLayer)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

A stack of different layers derived from BaseLayer

Initialize the Sequential model with the given layers. :param layers: A variable number of layers (instances of BaseLayer or derived classes).

switch_to_cuda()[source]#: Convert all layers to cuda layer

property layers: None#: Get layers

class pytagi.nn.LayerNorm(normalized_shape: List[int], eps: float = 0.0001, bias: bool = True)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Implements Layer Normalization by normalizing the inputs across the features dimension. It inherits from BaseLayer.

Initializes the LayerNorm layer.

Parameters:

normalized_shape – The shape of the input to normalize over (e.g., the size of the feature dimension). Expected to be a list of integers.
eps – A small value added to the denominator for numerical stability to prevent division by zero. Defaults to 1e-4.
bias – If True, the layer will use an additive bias (beta) during normalization. Defaults to True.

get_layer_info() → str[source]#: Retrieves a descriptive string containing information about the layer’s configuration (e.g., its shape and parameters) from the C++ backend.

get_layer_name() → str[source]#: Retrieves the name of the layer (e.g., ‘LayerNorm’) from the C++ backend.

init_weight_bias()[source]#: Initializes the layer’s internal parameters, specifically the learnable scale (gamma) and, if ‘bias’ is True, the learnable offset (beta). This task is delegated to the C++ backend.

class pytagi.nn.Linear(input_size: int, output_size: int, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Implements a Fully-connected layer, also known as a dense layer. This layer performs a linear transformation on the input data: $y = xW^T + b$, where $x$ is the input, $W$ is the weight matrix, and $b$ is the optional bias vector. It inherits from BaseLayer.

Initializes the Linear layer.

Parameters:

input_size – The number of features in the input tensor (the size of the last dimension).
output_size – The number of features in the output tensor. This determines the number of neurons in the layer.
bias – If True, an additive bias vector ‘b’ is included in the linear transformation. Defaults to True.
gain_weight – Scaling factor applied to the initialized weights ($W$). Defaults to 1.0.
gain_bias – Scaling factor applied to the initialized biases ($b$). Defaults to 1.0.
init_method – The method used for initializing the weights and biases (e.g., “He”, “Xavier”, “Normal”). Defaults to “He”.

get_layer_info() → str[source]#: Retrieves a descriptive string containing information about the layer’s configuration (e.g., input/output size, whether bias is used) from the C++ backend.

get_layer_name() → str[source]#: Retrieves the name of the layer (e.g., ‘Linear’) from the C++ backend.

init_weight_bias()[source]#: Initializes the layer’s parameters—the weight matrix ($W$) and the optional bias vector ($b$)—using the specified initialization method and gain factors. This task is delegated to the C++ backend.

class pytagi.nn.LSTM(input_size: int, output_size: int, seq_len: int, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

A Long Short-Term Memory (LSTM) layer for RNNs. It inherits from BaseLayer.

Initializes the LSTM layer.

Parameters:

input_size – The number of features in the input tensor at each time step.
output_size – The size of the hidden state ($h_t$), which is the number of features in the output tensor at each time step.
seq_len – The maximum length of the input sequence. This is often required for efficient memory allocation in C++/CUDA backends like cuTAGI.
bias – If True, the internal gates and cell state updates will include an additive bias vector. Defaults to True.
gain_weight – Scaling factor applied to the initialized weights ($W$). Defaults to 1.0.
gain_bias – Scaling factor applied to the initialized biases ($b$). Defaults to 1.0.
init_method – The method used for initializing the weights and biases (e.g., “He”, “Xavier”). Defaults to “He”.

get_layer_info() → str[source]#: Retrieves a descriptive string containing information about the layer’s configuration (e.g., input/output size, sequence length) from the C++ backend.

get_layer_name() → str[source]#: Retrieves the name of the layer (e.g., ‘LSTM’) from the C++ backend.

init_weight_bias()[source]#: Initializes the various weight matrices and bias vectors used by the LSTM’s gates (input, forget, output) and cell state updates, using the specified method and gain factors. This task is delegated to the C++ backend.

class pytagi.nn.OutputUpdater(model_device: str)[source]#

A utility to compute the error signal (delta states) for the output layer.

This class calculates the difference between the model’s predictions and the observations, which is essential for performing the backward pass to update the model’s parameters. It wraps the C++/CUDA backend cutagi.OutputUpdater.

Initializes the OutputUpdater.

Parameters:: model_device (str) – The computational device the model is on (e.g., ‘cpu’ or ‘cuda:0’).

update(output_states: pytagi.nn.data_struct.BaseHiddenStates, mu_obs: numpy.ndarray, var_obs: numpy.ndarray, delta_states: pytagi.nn.data_struct.BaseDeltaStates)[source]#

Computes the delta states based on observations.

This method is used for homoscedastic regression where the observation variance is known and provided.

Parameters:

output_states (pytagi.nn.data_struct.BaseHiddenStates) – The hidden states (mean and variance) of the model’s output layer.
mu_obs (np.ndarray) – The mean of the ground truth observations.
var_obs (np.ndarray) – The variance of the ground truth observations.
delta_states (pytagi.nn.data_struct.BaseDeltaStates) – The delta states object to be updated with the computed error signal.

update_using_indices(output_states: pytagi.nn.data_struct.BaseHiddenStates, mu_obs: numpy.ndarray, var_obs: numpy.ndarray, selected_idx: numpy.ndarray, delta_states: pytagi.nn.data_struct.BaseDeltaStates)[source]#

Computes the delta states for a selected subset of outputs.

This is useful in scenarios like hierarchical softmax or when only a sparse set of outputs needs to be updated.

Parameters:

output_states (pytagi.nn.data_struct.BaseHiddenStates) – The hidden states of the model’s output layer.
mu_obs (np.ndarray) – The mean of the ground truth observations.
var_obs (np.ndarray) – The variance of the ground truth observations.
selected_idx (np.ndarray) – An array of indices specifying which output neurons to update.
delta_states (pytagi.nn.data_struct.BaseDeltaStates) – The delta states object to be updated with the computed error signal.

update_heteros(output_states: pytagi.nn.data_struct.BaseHiddenStates, mu_obs: numpy.ndarray, delta_states: pytagi.nn.data_struct.BaseDeltaStates)[source]#

Computes delta states for heteroscedastic regression.

In this case, the model is expected to predict both the mean and the variance of the output. The predicted variance is taken from the output_states.

Parameters:

output_states (pytagi.nn.data_struct.BaseHiddenStates) – The hidden states of the model’s output layer. The model’s predicted variance is sourced from here.
mu_obs (np.ndarray) – The mean of the ground truth observations.
delta_states (pytagi.nn.data_struct.BaseDeltaStates) – The delta states object to be updated with the computed error signal.

property device: str#: The computational device (‘cpu’ or ‘cuda’) the updater is on.

class pytagi.nn.AvgPool2d(kernel_size: int, stride: int = -1, padding: int = 0, padding_type: int = 0)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

2D Average Pooling Layer.

This layer performs 2D average pooling operation. It wraps the C++/CUDA backend cutagi.AvgPool2d.

Initializes the AvgPool2d layer.

Parameters:

kernel_size (int) – The size of the pooling window (a single integer for square kernels).
stride (int) – The stride of the pooling operation. Default is -1, which typically means stride=kernel_size.
padding (int) – The implicit zero padding added to both sides of the input.
padding_type (int) – The type of padding to be used (e.g., 0 for zero padding).

get_layer_info() → str[source]#: Returns a string containing information about the layer.

get_layer_name() → str[source]#: Returns the name of the layer (e.g., ‘AvgPool2d’).

class pytagi.nn.MaxPool2d(kernel_size: int, stride: int = 1, padding: int = 0, padding_type: int = 0)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

2D Max Pooling Layer.

This layer performs 2D max pooling operation based on the input expected values. It wraps the C++/CUDA backend cutagi.MaxPool2d.

Initializes the MaxPool2d layer.

Parameters:

kernel_size (int) – The size of the pooling window (a single integer for square kernels).
stride (int) – The stride of the pooling operation. Default is 1.
padding (int) – The implicit zero padding added to both sides of the input.
padding_type (int) – The type of padding to be used (e.g., 0 for zero padding).

get_layer_info() → str[source]#: Returns a string containing information about the layer.

get_layer_name() → str[source]#: Returns the name of the layer (e.g., ‘MaxPool2d’).

class pytagi.nn.ResNetBlock(main_block: pytagi.nn.base_layer.BaseLayer | pytagi.nn.layer_block.LayerBlock, shortcut: pytagi.nn.base_layer.BaseLayer | pytagi.nn.layer_block.LayerBlock = None)[source]#

Bases: pytagi.nn.base_layer.BaseLayer

A Residual Network (ResNet) block structure.

This class implements the core structure of a ResNet block, consisting of a main block (which performs the main transformations) and an optional shortcut connection (which adds the input to the main block’s output). It wraps the C++/CUDA backend cutagi.ResNetBlock.

Initializes the ResNetBlock.

Parameters:

main_block (Union[BaseLayer, LayerBlock]) – The primary set of layers in the block (e.g., convolutional layers).
shortcut (Union[BaseLayer, LayerBlock], optional) – The optional shortcut connection, often an identity mapping or a projection. If None, an identity shortcut is implicitly assumed by the C++ backend.

init_shortcut_state() → None[source]#: Initializes the hidden state buffers for the shortcut layer.

init_shortcut_delta_state() → None[source]#: Initializes the delta state buffers (error signals) for the shortcut layer.

init_input_buffer() → None[source]#: Initializes the input state buffer used to hold the input for both the main block and the shortcut.

property main_block: pytagi.nn.layer_block.LayerBlock#: Gets the main block component of the ResNet block.

property shortcut: pytagi.nn.base_layer.BaseLayer#: Gets the shortcut component of the ResNet block.

property input_z: pytagi.nn.data_struct.BaseHiddenStates#: Gets the buffered input hidden states (mean and variance) for the block.

property input_delta_z: pytagi.nn.data_struct.BaseDeltaStates#: Gets the delta states (error signals) associated with the block’s input.

property shortcut_output_z: pytagi.nn.data_struct.BaseHiddenStates#: Gets the output hidden states (mean and variance) from the shortcut layer.

property shortcut_output_delta_z: pytagi.nn.data_struct.BaseDeltaStates#: Gets the delta states (error signals) associated with the shortcut layer’s output.

class pytagi.nn.Sequential(*layers: pytagi.nn.base_layer.BaseLayer)[source]#

A sequential container for layers.

Layers are added to the container in the order they are passed in the constructor. This class acts as a Python wrapper for the C++/CUDA backend cutagi.Sequential.

Example

>>> import pytagi.nn as nn
>>> model = nn.Sequential(
...     nn.Linear(10, 20),
...     nn.ReLU(),
...     nn.Linear(20, 5)
... )
>>> mu_in = np.random.randn(1, 10)
>>> var_in = np.abs(np.random.randn(1, 10))
>>> mu_out, var_out = model(mu_in, var_in)

Initializes the Sequential model with a sequence of layers.

Parameters:: layers (BaseLayer) – A variable number of layer instances (e.g., Linear, ReLU) that will be executed in sequence.

__call__(mu_x: numpy.ndarray, var_x: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray][source]#

An alias for the forward pass.

Parameters:

mu_x (np.ndarray) – The mean of the input data.
var_x (np.ndarray, optional) – The variance of the input data. Defaults to None.

Returns:

A tuple containing the mean and variance of the output.

Return type:

Tuple[np.ndarray, np.ndarray]

property layers: List[pytagi.nn.base_layer.BaseLayer]#: The list of layers in the model.

property output_z_buffer: pytagi.nn.data_struct.BaseHiddenStates#: The output hidden states buffer from the forward pass.

property input_delta_z_buffer: pytagi.nn.data_struct.BaseDeltaStates#: The input delta states buffer used in the backward pass.

property output_delta_z_buffer: pytagi.nn.data_struct.BaseDeltaStates#: The output delta states buffer from the backward pass.

property z_buffer_size: int#: The size of the hidden state (z) buffer.

property z_buffer_block_size: int#: The block size of the hidden state (z) buffer.

property device: str#: The computational device (‘cpu’ or ‘cuda’) the model is on.

property input_state_update: bool#: Flag indicating if the input state should be updated.

property num_samples: int#: The number of samples used for Monte Carlo estimation. This is used for debugging purposes

to_device(device: str)[source]#

Moves the model and its parameters to a specified device.

Parameters:: device (str) – The target device, e.g., ‘cpu’ or ‘cuda:0’.

params_to_device()[source]#: Moves the model parameters to the currently configured CUDA device.

params_to_host()[source]#: Moves the model parameters from the CUDA device to the host (CPU).

set_threads(num_threads: int)[source]#

Sets the number of CPU threads to use for computation.

Parameters:: num_threads (int) – The number of threads.

train()[source]#: Sets the model to training mode.

eval()[source]#: Sets the model to evaluation mode.

forward(mu_x: numpy.ndarray, var_x: numpy.ndarray = None) → Tuple[numpy.ndarray, numpy.ndarray][source]#

Performs a forward pass through the network.

Parameters:

mu_x (np.ndarray) – The mean of the input data.
var_x (np.ndarray, optional) – The variance of the input data. Defaults to None.

Returns:

A tuple containing the mean and variance of the output.

Return type:

Tuple[np.ndarray, np.ndarray]

backward()[source]#: Performs a backward pass to update the network parameters.

smoother() → Tuple[numpy.ndarray, numpy.ndarray][source]#

Performs a smoother pass (e.g., Rauch-Tung-Striebel smoother).

This is used with the SLSTM to refine estimates by running backwards through time.

Returns:: A tuple containing the mean and variance of the smoothed output.
Return type:: Tuple[np.ndarray, np.ndarray]

step()[source]#: Performs a single step of inference to update the parameters.

reset_lstm_states()[source]#: Resets the hidden and cell states of all LSTM layers in the model.

output_to_host() → List[float][source]#

Copies the raw output data from the device to the host.

Returns:: A list of floating-point values representing the flattened output.
Return type:: List[float]

delta_z_to_host() → List[float][source]#

Copies the raw delta Z (error signal) data from the device to the host.

Returns:: A list of floating-point values representing the flattened delta Z.
Return type:: List[float]

set_delta_z(delta_mu: numpy.ndarray, delta_var: numpy.ndarray)[source]#

Sets the delta Z (error signal) on the device for the backward pass.

Parameters:

delta_mu (np.ndarray) – The mean of the error signal.
delta_var (np.ndarray) – The variance of the error signal.

get_layer_stack_info() → str[source]#

Gets a string representation of the layer stack architecture.

Returns:: A descriptive string of the model’s layers.
Return type:: str

preinit_layer()[source]#: Pre-initializes the layers in the model.

get_neg_var_w_counter() → dict[source]#

Counts the number of negative variance weights in each layer.

Returns:: A dictionary where keys are layer names and values are the counts of negative variances.
Return type:: dict

save(filename: str)[source]#

Saves the model’s state to a binary file.

Parameters:: filename (str) – The path to the file where the model will be saved.

load(filename: str)[source]#

Loads the model’s state from a binary file.

Parameters:: filename (str) – The path to the file from which to load the model.

save_csv(filename: str)[source]#

Saves the model parameters to a CSV file.

Parameters:: filename (str) – The base path for the CSV file(s).

load_csv(filename: str)[source]#

Loads the model parameters from a CSV file.

Parameters:: filename (str) – The base path of the CSV file(s).

parameters() → List[Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray]][source]#

Gets all model parameters.

Returns:: A list where each element is a tuple containing the parameters for a layer: (mu_w, var_w, mu_b, var_b).
Return type:: List[Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]

load_state_dict(state_dict: dict)[source]#

Loads the model’s parameters from a state dictionary.

Parameters:: state_dict (dict) – A dictionary containing the model’s state.

state_dict() → dict[source]#

Gets the model’s parameters as a state dictionary.

Returns:: A dictionary where each key is the layer name and the value is a tuple of parameters: (mu_w, var_w, mu_b, var_b).
Return type:: dict

params_from(other: Sequential)[source]#

Copies parameters from another Sequential model.

Parameters:: other (Sequential) – The source model from which to copy parameters.

get_outputs() → Tuple[numpy.ndarray, numpy.ndarray][source]#

Gets the outputs from the last forward pass.

Returns:: A tuple containing the mean and variance of the output.
Return type:: Tuple[np.ndarray, np.ndarray]

get_outputs_smoother() → Tuple[numpy.ndarray, numpy.ndarray][source]#

Gets the outputs from the last smoother pass.

Returns:: A tuple containing the mean and variance of the smoothed output.
Return type:: Tuple[np.ndarray, np.ndarray]

get_input_states() → Tuple[numpy.ndarray, numpy.ndarray][source]#

Gets the input states of the model.

Returns:: A tuple containing the mean and variance of the input states.
Return type:: Tuple[np.ndarray, np.ndarray]

get_norm_mean_var() → dict[source]#

Gets the mean and variance from normalization layers.

Returns:: A dictionary where each key is a normalization layer name and the value is a tuple of four arrays: (mu_batch, var_batch, mu_ema_batch, var_ema_batch).
Return type:: dict

get_lstm_states(time_step: int = -1) → dict[source]#

Get the LSTM states for all LSTM layers as a dictionary.

Parameters:: time_step (int, optional) – The time step at which to retrieve the smoothed SLSTM states. If not provided or -1, retrieves the unsmoothed current LSTM states.
Returns:: A dictionary mapping layer indices to a 4-tuple of numpy arrays: (mu_h_prior, var_h_prior, mu_c_prior, var_c_prior).
Return type:: dict

set_lstm_states(states: dict) → None[source]#

Sets the states for all LSTM layers.

Parameters:: states (dict) – A dictionary mapping layer indices to a 4-tuple of numpy arrays: (mu_h_prior, var_h_prior, mu_c_prior, var_c_prior).

class pytagi.nn.SLinear(input_size: int, output_size: int, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Smoother Linear layer for the SLSTM architecture.

This layer performs a linear transformation ($y = xW^T + b'), specifically designed to be used within SLSTM where a hidden- and cell-state smoothing through time is applied. It wraps the C++/CUDA backend `cutagi.SLinear$.

Initializes the SLinear layer.

Parameters:

input_size (int) – The number of input features.
output_size (int) – The number of output features.
bias (bool) – If True, adds a learnable bias to the output.
gain_weight (float) – A scaling factor applied to the initialized weights.
gain_bias (float) – A scaling factor applied to the initialized bias terms.
init_method (str) – The method used for initializing weights and biases (e.g., ‘He’, ‘Xavier’).

get_layer_info() → str[source]#: Returns a string containing information about the layer’s configuration (sizes, bias, etc.).

get_layer_name() → str[source]#: Returns the name of the layer (e.g., ‘SLinear’).

init_weight_bias()[source]#: Initializes the layer’s weight matrix and bias vector based on the configured method.

class pytagi.nn.SLSTM(input_size: int, output_size: int, seq_len: int, bias: bool = True, gain_weight: float = 1.0, gain_bias: float = 1.0, init_method: str = 'He')[source]#

Bases: pytagi.nn.base_layer.BaseLayer

Smoothing Long Short-Term Memory (LSTM) layer.

This layer is a variation of the standard LSTM, incorporating a mechanism for smoothing the hidden- and cell-states. It wraps the C++/CUDA backend cutagi.SLSTM.

Initializes the SLSTM layer.

Parameters:

input_size (int) – The number of expected features in the input $x$.
output_size (int) – The number of features in the hidden state $h$ (and the output).
seq_len (int) – The maximum sequence length this layer is configured to handle.
bias (bool) – If True, use bias weights in the internal linear transformations.
gain_weight (float) – A scaling factor applied to the initialized weights.
gain_bias (float) – A scaling factor applied to the initialized bias terms.
init_method (str) – The method used for initializing weights and biases (e.g., ‘He’, ‘Xavier’).

get_layer_info() → str[source]#: Returns a string containing detailed information about the layer’s configuration.

get_layer_name() → str[source]#: Returns the name of the layer (e.g., ‘SLSTM’).

init_weight_bias()[source]#: Initializes all the layer’s internal weight matrices and bias vectors (for gates and cell) based on the configured method.