Statistical Analysis

Sliding FFT + NMF

class atomai.stat.SlidingFFTNMF(window_size_x=None, window_size_y=None, window_step_x=None, window_step_y=None, interpolation_factor=2, zoom_factor=2, hamming_filter=True, components=4)[source]

Bases: object

make_windows(image)[source]

Generate windows from an image using efficient striding operations

process_fft(windows)[source]

Perform FFT on each window with optional hamming filter and zooming

run_nmf(fft_results)[source]

Run NMF on FFT results to extract components

analyze_image(image_input, output_path=None)[source]

Full analysis pipeline for an image

Parameters:

image_inputstr or numpy.ndarray

Either a file path to an image or a numpy array containing image data

output_pathstr, optional

Path for saving output files. If None, will be auto-generated for file inputs or use current directory for array inputs

Spectral Unmixing

class atomai.stat.SpectralUnmixer(method='nmf', n_components=4, normalize=False, **kwargs)[source]

Bases: object

Applies various decomposition algorithms to hyperspectral data for spectral unmixing and component analysis.

Supported methods: ‘nmf’, ‘pca’, ‘ica’, ‘gmm’.

fit(hspy_data)[source]

Fits the selected model to a hyperspectral data cube.

plot_results(x_axis_vals=None, x_axis_units=None, **kwargs)[source]

Local image analysis

class atomai.stat.imlocal(network_output, coord_class_dict_all, window_size=None, coord_class=0)[source]

Bases: object

Class for extraction and statistical analysis of local image descriptors. It assumes that input image data is an output of a neural network, but it can also work with regular experimental images (make sure you have extra dimensions for channel and batch size).

Parameters:
  • network_output (4D numpy array) – Output of a fully convolutional neural network where a class is assigned to every pixel in the input image(s). The dimensions are \(images \times height \times width \times channels\)

  • coord_class_dict_all (dict) – Prediction from atomnet.locator (can be from other source but must be in the same format) Each element is a \(N \times 3\) numpy array, where N is a number of detected atoms/defects, the first 2 columns are xy coordinates and the third columns is class (starts with 0)

  • window_size (int) – Side of the square for subimage cropping

  • coord_class (int) – Class of atoms/defects around around which the subimages will be cropped; in the atomnet.locator output the class is the 3rd column (the first two are xy positions)

Examples:

Identification of distortion domains in a single atomic image:

>>> # First obtain a "cleaned" image and atomic coordinates using a trained model
>>> nn_output, coordinates = model.predict(expdata)
>>> # Now get local image descriptors using ```atomai.stat.imlocal```
>>> imstack = stat.imlocal(nn_output, coordinates, window_size=32, coord_class=1)
>>> # Compute PCA scree plot to estimate the number of components/sources
>>> imstack.pca_scree_plot(plot_results=True);
>>> # Do PCA analysis and plot results
>>> pca_results = imstack.imblock_pca(n_components=4, plot_results=True)
>>> # Do NMF analysis and plot results
>>> pca_results = imstack.imblock_nmf(n_components=4, plot_results=True)

Analysis of atomic/defect trajectories from movies (3D image stack):

>>> # Get local descriptors (such as subimages centered around impurities)
>>> imstack = stat.imlocal(nn_output, coordinates, window_size=32, coord_class=1)
>>> # Calculate Gaussian mixture model (GMM) components
>>> components_img, classes_list = imstack.gmm(n_components=10, plot_results=True)
>>> # Calculate GMM components and transition probabilities for different trajectories
>>> traj_all, trans_all, fram_all = imstack.transition_matrix(n_components=10, rmax=10)
extract_subimages_()[source]

Extracts subimages centered at certain atom class/type in the neural network output

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • stack of subimages

  • (x, y) coordinates of their centers

  • frame number associated with each subimage

gmm(n_components, covariance='diag', random_state=1, plot_results=False)[source]

Applies Gaussian mixture model to image stack.

Parameters:
  • n_components (int) – Number of components

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting gmm components

Return type:

Tuple[ndarray, List]

Returns:

3-element tuple containing

  • 4D numpy array with GMM “centroids” (averaged images for each class)

  • List where each element contains 4D images belonging to each GMM class

  • 2D numpy array with xy coordinates, label and corresponding frame number for each subimage

pca(n_components, random_state=1, plot_results=False)[source]

Computes PCA eigenvectors for a stack of subimages.

Parameters:
  • n_components (int) – Number of PCA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped principal axes

  • 2D numpy with projection of X_vec (vector with flattened subimages) on the first principal components

  • 2D numpy array with center-of-mass coordinates and corresponding frame number for each subimage

ica(n_components, random_state=1, plot_results=False)[source]

Computes ICA independent souces for a stack of subimages.

Parameters:
  • n_components (int) – Number of ICA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed sources

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped independent sources

  • 2D numpy array with recovered sources from X_vec (vector with flattned subimages)

  • 2D numpy aray with center-of-mass coordinates and corresponding frame number for each subimage

nmf(n_components, random_state=1, plot_results=False, **kwargs)[source]

Applies NMF to source separation from a stack of subimages

Parameters:
  • n_components (int) – Number of NMF components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed sources

  • **max_iterations (int) – Maximum number of iterations before timing out

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped sources

  • 2D numpy array with transformed data according to the trained NMF model,

  • 2D numpy aray with center-of-mass coordinates and corresponding frame number for each subimage

pca_gmm(n_components_gmm, n_components_pca, plot_results=False, covariance_type='diag', random_state=1)[source]

Performs PCA decomposition on GMM-unmixed classes. Can be used when GMM allows separating different symmetries (e.g. different sublattices in graphene)

Parameters:
  • n_components_gmm (int) – Number of components for GMM

  • n_components_pca (int or list of int) – Number of PCA components. Pass a list of integers in order to have different number PCA of components for each GMM class

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting GMM components

Return type:

Tuple[ndarray, List]

Returns:

4-element tuple containing

  • 4D numpy array with GMM “centroids” (averaged images for each GMM class)

  • List of 4D numpy arrays with PCA components

  • List with PCA-transformed data

  • 2D numpy array with xy coordinates, GMM-assigned labels, and corresponding frame numbers

pca_scree_plot(plot_results=True)[source]

Computes and plots PCA ‘scree plot’ (explained variance ratio vs number of components)

Return type:

ndarray

pca_gmm_scree_plot(n_components_gmm, covariance_type='diag', random_state=1, plot_results=True)[source]

Computes PCA scree plot for each GMM class

Parameters:
  • n_components_gmm (int) – Number of components for GMM

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting GMM components and PCA scree plot

Return type:

List[ndarray]

Returns:

List with PCA explained variances for each GMM component

imblock_pca(n_components, random_state=1, plot_results=False, **kwargs)[source]

Computes PCA eigenvectors and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of PCA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **marker_size (int) – Controls marker size for loading maps plot

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) principal axes

  • 2D numpy array with projection of X_vec (vector with flattened subimages) on the first principal components

  • 2D numpy array with coordinates of each subimage

imblock_ica(n_components, random_state=1, plot_results=False, **kwargs)[source]

Computes ICA independent souces and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of ICA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **marker_size (int) – controls marker size for loading maps plot

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) independent sources

  • 2D numpy array with recovered sources from X_vec (vector with flattened subimages)

  • 2D numpy array with coordinates of each subimage

imblock_nmf(n_components, random_state=1, plot_results=False, **kwargs)[source]

Applies NMF to source separation. Computes sources and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of NMF components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **max_iterations (int) – Maximum number of iterations before timing out

  • **marker_size (int) – Controls marker’s size for loading maps plots

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) sources

  • 2D numpy array with transformed X_vec (vector with flattened subimages) according to the trained NMF model

  • 2D numpy array with coordinates of each subimage

classmethod plot_decomposition_results(components, X_vec_t, image_hw=None, xy_centers=None, plot_loading_maps=True, **kwargs)[source]

Plots decomposition “eigenvectors”. Plots loading maps

Parameters:
  • components (4D numpy array) – Computed (and reshaped) principal axes / independent sources / factorization matrix for stack of subimages

  • X_vec_t (2D numpy array) – Projection of X_vec on the first principal components / Recovered sources from X_vec / transformed X_vec according to the learned NMF model (is used to create “loading maps”)

  • img_hw (tuple) – Height and width of the “mother image”

  • xy_centers (n x 2 numpy array) – (x, y) coordinates of the extracted subimages

  • plot_loading_maps (bool) – Plots loading maps for each “eigenvector”

  • **marker_size (int) – Controls marker’s size for loading maps plots

Return type:

None

classmethod get_trajectory(coord_class_dict, start_coord, rmax)[source]

Extracts a trajectory of a single defect/atom from image stack

Parameters:
  • coord_class_dict (dict) – Dictionary of atomic coordinates (same format as produced by atomnet.locator)

  • start_coord (N x 2 numpy array) – Coordinate of defect/atom in the first frame whose trajectory we are going to track

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

Return type:

Tuple[ndarray]

Returns:

2-element tuple containing

  • Numpy array of defect/atom coordinates form a single trajectory

  • Frames corresponding to this trajectory

get_all_trajectories(min_length=0, run_gmm=False, rmax=10, **kwargs)[source]

Extracts trajectories for the detected defects starting from the first frame. Applies (optionally) Gaussian mixture model to a stack of local descriptors (subimages).

Parameters:
  • min_length (int) – Minimal length of trajectory to return

  • run_gmm (bool) – Optional GMM separation into different classes

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

  • **n_components (int) – Number of components for Gaussian mixture model

  • **covariance (str) – Type of covariance for Gaussian mixture model (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • **random_state (int) – Random state instance for Gaussian mixture model

Return type:

Dict

Returns:

Python dictionary containing

  • list of numpy arrays with defects/atoms trajectories (“trajectories”)

  • list of frames corresponding to the extracted trajectories (“frames”)

  • GMM components when run_gmm=True (“gmm_components”)

classmethod renumerate_classes(classes)[source]

Helper functions for renumerating Gaussian mixture model classes for Markov transition analysis

Return type:

ndarray

transition_matrix(n_components, covariance='diag', random_state=1, rmax=10, min_length=0, sum_all_transitions=False)[source]

Applies Gaussian mixture model to a stack of local descriptors (subimages). Extracts trajectories for the detected defects starting from the first frame. Calculates transition probability for each trajectory.

Parameters:
  • n_components (int) – Number of components for Gaussian mixture model

  • covariance (str) – Type of covariance for Gaussian mixture model (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance for Gaussian mixture model

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

  • min_length (int) – Minimal length of trajectory to return

Return type:

Dict

Returns:

Pyhton dictionary containing

  • List of defects/atoms trajectories (“trajectories”)

  • List of transition matrices for each trajectory (“transitions”)

  • List of frames corresponding to the extracted trajectories (“frames”)

  • GMM components as images (“gmm_components”)