pytagi.tagi_utils#
Classes#
A frontend for utility functions from the C++/CUDA backend. |
|
A collection of methods for data normalization and denormalization. |
Functions#
|
Implements an exponential decay schedule for a given value. |
Module Contents#
- class pytagi.tagi_utils.Utils[source]#
A frontend for utility functions from the C++/CUDA backend.
This class provides a Python interface to various utility functions implemented in the C++ cutagi library, such as data loading, preprocessing, and post-processing tasks related to machine learning models.
- Variables:
_cpp_backend – An instance of cutagi.Utils which provides the backend functionalities.
Initializes the Utils class by creating an instance of the C++ backend.
- label_to_obs(labels: numpy.ndarray, num_classes: int) Tuple[numpy.ndarray, numpy.ndarray, int] [source]#
Converts class labels into observations for a binary tree structure.
This is used in the hierarchical classification, where each label is mapped to a path in a binary tree, and the observations represent the nodes along that path.
- Parameters:
labels (numpy.ndarray) – An array of class labels for the dataset.
num_classes (int) – The total number of unique classes.
- Returns:
A tuple containing: - obs (numpy.ndarray): Encoded observations corresponding to the labels. - obs_idx (numpy.ndarray): Indices of the encoded observations. - num_obs (int): The total number of encoded observations.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray, int]
- label_to_one_hot(labels: numpy.ndarray, num_classes: int) numpy.ndarray [source]#
Generates a one-hot encoding for the given labels.
- Parameters:
labels (numpy.ndarray) – An array of class labels for the dataset.
num_classes (int) – The total number of unique classes.
- Returns:
A 2D array representing the one-hot encoded labels.
- Return type:
numpy.ndarray
- load_mnist_images(image_file: str, label_file: str, num_images: int) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Loads a specified number of images and labels from the MNIST dataset files.
- Parameters:
image_file (str) – The file path to the MNIST image data (e.g., ‘train-images-idx3-ubyte’).
label_file (str) – The file path to the MNIST label data (e.g., ‘train-labels-idx1-ubyte’).
num_images (int) – The number of images to load from the files.
- Returns:
A tuple containing: - images (numpy.ndarray): A 2D array of flattened MNIST images. - labels (numpy.ndarray): A 1D array of corresponding labels.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- load_cifar_images(image_file: str, num: int) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Loads a specified number of images and labels from a CIFAR-10 dataset file.
- Parameters:
image_file (str) – The file path to a CIFAR-10 data batch file.
num (int) – The number of images to load from the file.
- Returns:
A tuple containing: - images (numpy.ndarray): A 2D array of flattened CIFAR-10 images. - labels (numpy.ndarray): A 1D array of corresponding labels.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- get_labels(ma: numpy.ndarray, Sa: numpy.ndarray, hr_softmax: pytagi.nn.data_struct.HRCSoftmax, num_classes: int, batch_size: int) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Predicts class labels from the output layer’s activation statistics.
Uses hierarchical softmax to convert the mean and variance of the output layer’s activations into class predictions and their probabilities.
- Parameters:
ma (numpy.ndarray) – The mean of the activation units for the output layer.
Sa (numpy.ndarray) – The variance of the activation units for the output layer.
hr_softmax (pytagi.nn.HRCSoftmax) – An initialized hierarchical softmax structure.
num_classes (int) – The total number of classes.
batch_size (int) – The number of samples in the batch.
- Returns:
A tuple containing: - pred (numpy.ndarray): The predicted class labels for the batch. - prob (numpy.ndarray): The probabilities for each predicted label.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- get_errors(ma: numpy.ndarray, Sa: numpy.ndarray, labels: numpy.ndarray, hr_softmax: pytagi.nn.data_struct.HRCSoftmax, num_classes: int, batch_size: int) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Computes the prediction error given the output layer’s statistics and true labels.
This method calculates the classification error rate and probabilities based on the hierarchical softmax output.
- Parameters:
ma (numpy.ndarray) – The mean of the activation units for the output layer.
Sa (numpy.ndarray) – The variance of the activation units for the output layer.
labels (numpy.ndarray) – The ground truth labels for the dataset.
hr_softmax (pytagi.nn.HRCSoftmax) – An initialized hierarchical softmax structure.
num_classes (int) – The total number of classes.
batch_size (int) – The number of samples in a batch.
- Returns:
A tuple containing: - pred (numpy.ndarray): The prediction error for the batch. - prob (numpy.ndarray): The probabilities associated with the predictions.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- get_hierarchical_softmax(num_classes: int) pytagi.nn.data_struct.HRCSoftmax [source]#
Constructs a hierarchical softmax structure (binary tree) for classification.
- Parameters:
num_classes (int) – The total number of classes to be included in the tree.
- Returns:
An object representing the hierarchical softmax structure.
- Return type:
- obs_to_label_prob(ma: numpy.ndarray, Sa: numpy.ndarray, hr_softmax: pytagi.nn.data_struct.HRCSoftmax, num_classes: int) numpy.ndarray [source]#
Converts observation probabilities to label probabilities.
This function takes the output statistics of a model (mean and variance) and uses the hierarchical softmax structure to compute the probability of each class label.
- Parameters:
ma (numpy.ndarray) – The mean of the activation units for the output layer.
Sa (numpy.ndarray) – The variance of the activation units for the output layer.
hr_softmax (pytagi.nn.HRCSoftmax) – An initialized hierarchical softmax structure.
num_classes (int) – The total number of classes.
- Returns:
An array of probabilities for each class label.
- Return type:
numpy.ndarray
- create_rolling_window(data: numpy.ndarray, output_col: numpy.ndarray, input_seq_len: int, output_seq_len: int, num_features: int, stride: int) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Creates input/output sequences for time-series forecasting using a rolling window.
This method slides a window over the time-series data to generate input sequences and their corresponding future output sequences.
- Parameters:
data (numpy.ndarray) – The time-series dataset, typically a 2D array of shape (timesteps, features).
output_col (numpy.ndarray) – The indices of the columns to be used as output targets.
input_seq_len (int) – The number of time steps in each input sequence.
output_seq_len (int) – The number of time steps in each output sequence.
num_features (int) – The total number of features in the dataset.
stride (int) – The number of time steps to move the window forward for each new sequence.
- Returns:
A tuple containing: - input_data (numpy.ndarray): A 2D array of input sequences. - output_data (numpy.ndarray): A 2D array of corresponding output sequences.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- get_upper_triu_cov(batch_size: int, num_data: int, sigma: float) numpy.ndarray [source]#
Creates an upper triangular covariance matrix for correlated inputs.
This is useful for models that assume temporal or spatial correlation in the input data, such as time-series models.
- Parameters:
batch_size (int) – The number of samples in a batch.
num_data (int) – The number of data points (e.g., time steps) in each sample.
sigma (float) – The standard deviation parameter controlling the covariance.
- Returns:
A 1D array representing the flattened upper triangular part of the covariance matrix.
- Return type:
numpy.ndarray
- pytagi.tagi_utils.exponential_scheduler(curr_v: float, min_v: float, decaying_factor: float, curr_iter: int) float [source]#
Implements an exponential decay schedule for a given value.
The value decays according to the formula: \(\text{new_v} = \max(\text{curr_v} \times (\text{decaying_factor} ** \text{curr_iter}), \text{min_v})\). This is commonly used for learning rate scheduling or for decaying exploration rates.
- Parameters:
curr_v (float) – The current value to be decayed.
min_v (float) – The minimum floor value that curr_v can decay to.
decaying_factor (float) – The base of the exponential decay (e.g., 0.99).
curr_iter (int) – The current iteration number.
- Returns:
The decayed value.
- Return type:
float
- class pytagi.tagi_utils.Normalizer(method: str | None = None)[source]#
A collection of methods for data normalization and denormalization.
Provides common scaling techniques such as standardization (Z-score) and min-max normalization. It also includes methods to reverse the transformations.
- Parameters:
method (str or None, optional) – The normalization method to use. Currently, this parameter is not used in the methods but can be set for context.
Initializes the Normalizer.
- Parameters:
method (str or None, optional) – The name of the normalization method (e.g., ‘standardize’).
- static standardize(data: numpy.ndarray, mu: numpy.ndarray, std: numpy.ndarray) numpy.ndarray [source]#
Applies Z-score normalization to the data.
The transformation is given by: \((data - \mu) / (\sigma + \epsilon)\).
- Parameters:
data (numpy.ndarray) – The input data to normalize.
mu (numpy.ndarray) – The mean of the data, typically computed per feature.
std (numpy.ndarray) – The standard deviation of the data, typically computed per feature.
- Returns:
The standardized data.
- Return type:
numpy.ndarray
- static unstandardize(norm_data: numpy.ndarray, mu: numpy.ndarray, std: numpy.ndarray) numpy.ndarray [source]#
Reverts the Z-score normalization.
The transformation is given by: \(\text{norm_data} \times (\sigma + \epsilon) + \mu\).
- Parameters:
norm_data (numpy.ndarray) – The standardized data to transform back to the original scale.
mu (numpy.ndarray) – The original mean used for standardization.
std (numpy.ndarray) – The original standard deviation used for standardization.
- Returns:
The data in its original scale.
- Return type:
numpy.ndarray
- static unstandardize_std(norm_std: numpy.ndarray, std: numpy.ndarray) numpy.ndarray [source]#
Scales a standardized standard deviation back to the original space.
The transformation is given by: \(\text{norm_std} \times (\sigma + \epsilon)\).
- Parameters:
norm_std (numpy.ndarray) – The standardized standard deviation.
std (numpy.ndarray) – The original standard deviation of the data.
- Returns:
The standard deviation in its original scale.
- Return type:
numpy.ndarray
- max_min_norm(data: numpy.ndarray, max_value: numpy.ndarray, min_value: numpy.ndarray) numpy.ndarray [source]#
Applies min-max normalization to scale data between 0 and 1.
The transformation is given by: \((\text{data} - \text{min_value}) / (\text{max_value} - \text{min_value} + \epsilon)\).
- Parameters:
data (numpy.ndarray) – The input data to normalize.
max_value (numpy.ndarray) – The maximum value of the data, typically per feature.
min_value (numpy.ndarray) – The minimum value of the data, typically per feature.
- Returns:
The data scaled to the [0, 1] range.
- Return type:
numpy.ndarray
- static max_min_unnorm(norm_data: numpy.ndarray, max_value: numpy.ndarray, min_value: numpy.ndarray) numpy.ndarray [source]#
Reverts the min-max normalization.
The transformation is given by: \(\text{norm_data} \times (\text{max_value} - \text{min_value} + \epsilon) + \text{min_value}\).
- Parameters:
norm_data (numpy.ndarray) – The min-max normalized data.
max_value (numpy.ndarray) – The original maximum value used for normalization.
min_value (numpy.ndarray) – The original minimum value used for normalization.
- Returns:
The data in its original scale.
- Return type:
numpy.ndarray
- static max_min_unnorm_std(norm_std: numpy.ndarray, max_value: numpy.ndarray, min_value: numpy.ndarray) numpy.ndarray [source]#
Scales a standard deviation from the min-max normalized space to the original space.
The transformation is given by: \(\text{norm_std} \times (\text{max_value} - \text{min_value} + \epsilon)\).
- Parameters:
norm_std (numpy.ndarray) – The standard deviation in the normalized space.
max_value (numpy.ndarray) – The original maximum value of the data.
min_value (numpy.ndarray) – The original minimum value of the data.
- Returns:
The standard deviation in the original data scale.
- Return type:
numpy.ndarray
- static compute_mean_std(data: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Computes the sample mean and standard deviation of the data along axis 0.
NaN values are ignored in the calculation.
- Parameters:
data (numpy.ndarray) – The input data array.
- Returns:
A tuple containing: - mean (numpy.ndarray): The mean of the data. - std (numpy.ndarray): The standard deviation of the data.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]
- static compute_max_min(data: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Computes the maximum and minimum values of the data along axis 0.
NaN values are ignored in the calculation.
- Parameters:
data (numpy.ndarray) – The input data array.
- Returns:
A tuple containing: - max (numpy.ndarray): The maximum values. - min (numpy.ndarray): The minimum values.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray]