Other utilities

Statistics

class atomai.stat.imlocal(network_output, coord_class_dict_all, window_size=None, coord_class=0)[source]

Class for extraction and statistical analysis of local image descriptors. It assumes that input image data is an output of a neural network, but it can also work with regular experimental images (make sure you have extra dimensions for channel and batch size).

Parameters:
  • network_output (4D numpy array) – Output of a fully convolutional neural network where a class is assigned to every pixel in the input image(s). The dimensions are \(images \times height \times width \times channels\)

  • coord_class_dict_all (dict) – Prediction from atomnet.locator (can be from other source but must be in the same format) Each element is a \(N \times 3\) numpy array, where N is a number of detected atoms/defects, the first 2 columns are xy coordinates and the third columns is class (starts with 0)

  • window_size (int) – Side of the square for subimage cropping

  • coord_class (int) – Class of atoms/defects around around which the subimages will be cropped; in the atomnet.locator output the class is the 3rd column (the first two are xy positions)

Examples:

Identification of distortion domains in a single atomic image:

>>> # First obtain a "cleaned" image and atomic coordinates using a trained model
>>> nn_output, coordinates = model.predict(expdata)
>>> # Now get local image descriptors using ```atomai.stat.imlocal```
>>> imstack = stat.imlocal(nn_output, coordinates, window_size=32, coord_class=1)
>>> # Compute PCA scree plot to estimate the number of components/sources
>>> imstack.pca_scree_plot(plot_results=True);
>>> # Do PCA analysis and plot results
>>> pca_results = imstack.imblock_pca(n_components=4, plot_results=True)
>>> # Do NMF analysis and plot results
>>> pca_results = imstack.imblock_nmf(n_components=4, plot_results=True)

Analysis of atomic/defect trajectories from movies (3D image stack):

>>> # Get local descriptors (such as subimages centered around impurities)
>>> imstack = stat.imlocal(nn_output, coordinates, window_size=32, coord_class=1)
>>> # Calculate Gaussian mixture model (GMM) components
>>> components_img, classes_list = imstack.gmm(n_components=10, plot_results=True)
>>> # Calculate GMM components and transition probabilities for different trajectories
>>> traj_all, trans_all, fram_all = imstack.transition_matrix(n_components=10, rmax=10)
extract_subimages_()[source]

Extracts subimages centered at certain atom class/type in the neural network output

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • stack of subimages

  • (x, y) coordinates of their centers

  • frame number associated with each subimage

gmm(n_components, covariance='diag', random_state=1, plot_results=False)[source]

Applies Gaussian mixture model to image stack.

Parameters:
  • n_components (int) – Number of components

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting gmm components

Return type:

Tuple[ndarray, List]

Returns:

3-element tuple containing

  • 4D numpy array with GMM “centroids” (averaged images for each class)

  • List where each element contains 4D images belonging to each GMM class

  • 2D numpy array with xy coordinates, label and corresponding frame number for each subimage

pca(n_components, random_state=1, plot_results=False)[source]

Computes PCA eigenvectors for a stack of subimages.

Parameters:
  • n_components (int) – Number of PCA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped principal axes

  • 2D numpy with projection of X_vec (vector with flattened subimages) on the first principal components

  • 2D numpy array with center-of-mass coordinates and corresponding frame number for each subimage

ica(n_components, random_state=1, plot_results=False)[source]

Computes ICA independent souces for a stack of subimages.

Parameters:
  • n_components (int) – Number of ICA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed sources

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped independent sources

  • 2D numpy array with recovered sources from X_vec (vector with flattned subimages)

  • 2D numpy aray with center-of-mass coordinates and corresponding frame number for each subimage

nmf(n_components, random_state=1, plot_results=False, **kwargs)[source]

Applies NMF to source separation from a stack of subimages

Parameters:
  • n_components (int) – Number of NMF components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed sources

  • **max_iterations (int) – Maximum number of iterations before timing out

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed and reshaped sources

  • 2D numpy array with transformed data according to the trained NMF model,

  • 2D numpy aray with center-of-mass coordinates and corresponding frame number for each subimage

pca_gmm(n_components_gmm, n_components_pca, plot_results=False, covariance_type='diag', random_state=1)[source]

Performs PCA decomposition on GMM-unmixed classes. Can be used when GMM allows separating different symmetries (e.g. different sublattices in graphene)

Parameters:
  • n_components_gmm (int) – Number of components for GMM

  • n_components_pca (int or list of int) – Number of PCA components. Pass a list of integers in order to have different number PCA of components for each GMM class

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting GMM components

Return type:

Tuple[ndarray, List]

Returns:

4-element tuple containing

  • 4D numpy array with GMM “centroids” (averaged images for each GMM class)

  • List of 4D numpy arrays with PCA components

  • List with PCA-transformed data

  • 2D numpy array with xy coordinates, GMM-assigned labels, and corresponding frame numbers

pca_scree_plot(plot_results=True)[source]

Computes and plots PCA ‘scree plot’ (explained variance ratio vs number of components)

Return type:

ndarray

pca_gmm_scree_plot(n_components_gmm, covariance_type='diag', random_state=1, plot_results=True)[source]

Computes PCA scree plot for each GMM class

Parameters:
  • n_components_gmm (int) – Number of components for GMM

  • covariance (str) – Type of covariance (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance

  • plot_results (bool) – Plotting GMM components and PCA scree plot

Return type:

List[ndarray]

Returns:

List with PCA explained variances for each GMM component

imblock_pca(n_components, random_state=1, plot_results=False, **kwargs)[source]

Computes PCA eigenvectors and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of PCA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **marker_size (int) – Controls marker size for loading maps plot

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) principal axes

  • 2D numpy array with projection of X_vec (vector with flattened subimages) on the first principal components

  • 2D numpy array with coordinates of each subimage

imblock_ica(n_components, random_state=1, plot_results=False, **kwargs)[source]

Computes ICA independent souces and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of ICA components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **marker_size (int) – controls marker size for loading maps plot

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) independent sources

  • 2D numpy array with recovered sources from X_vec (vector with flattened subimages)

  • 2D numpy array with coordinates of each subimage

imblock_nmf(n_components, random_state=1, plot_results=False, **kwargs)[source]

Applies NMF to source separation. Computes sources and their loading maps for a stack of subimages. Intended to be used for finding domains (“blocks”) (e.g. ferroic domains) in a single image.

Parameters:
  • n_components (int) – Number of NMF components

  • random_state (int) – Random state instance

  • plot_results (bool) – Plots computed eigenvectors and loading maps

  • **max_iterations (int) – Maximum number of iterations before timing out

  • **marker_size (int) – Controls marker’s size for loading maps plots

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • 4D numpy array with computed (and reshaped) sources

  • 2D numpy array with transformed X_vec (vector with flattened subimages) according to the trained NMF model

  • 2D numpy array with coordinates of each subimage

classmethod plot_decomposition_results(components, X_vec_t, image_hw=None, xy_centers=None, plot_loading_maps=True, **kwargs)[source]

Plots decomposition “eigenvectors”. Plots loading maps

Parameters:
  • components (4D numpy array) – Computed (and reshaped) principal axes / independent sources / factorization matrix for stack of subimages

  • X_vec_t (2D numpy array) – Projection of X_vec on the first principal components / Recovered sources from X_vec / transformed X_vec according to the learned NMF model (is used to create “loading maps”)

  • img_hw (tuple) – Height and width of the “mother image”

  • xy_centers (n x 2 numpy array) – (x, y) coordinates of the extracted subimages

  • plot_loading_maps (bool) – Plots loading maps for each “eigenvector”

  • **marker_size (int) – Controls marker’s size for loading maps plots

Return type:

None

classmethod get_trajectory(coord_class_dict, start_coord, rmax)[source]

Extracts a trajectory of a single defect/atom from image stack

Parameters:
  • coord_class_dict (dict) – Dictionary of atomic coordinates (same format as produced by atomnet.locator)

  • start_coord (N x 2 numpy array) – Coordinate of defect/atom in the first frame whose trajectory we are going to track

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

Return type:

Tuple[ndarray]

Returns:

2-element tuple containing

  • Numpy array of defect/atom coordinates form a single trajectory

  • Frames corresponding to this trajectory

get_all_trajectories(min_length=0, run_gmm=False, rmax=10, **kwargs)[source]

Extracts trajectories for the detected defects starting from the first frame. Applies (optionally) Gaussian mixture model to a stack of local descriptors (subimages).

Parameters:
  • min_length (int) – Minimal length of trajectory to return

  • run_gmm (bool) – Optional GMM separation into different classes

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

  • **n_components (int) – Number of components for Gaussian mixture model

  • **covariance (str) – Type of covariance for Gaussian mixture model (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • **random_state (int) – Random state instance for Gaussian mixture model

Return type:

Dict

Returns:

Python dictionary containing

  • list of numpy arrays with defects/atoms trajectories (“trajectories”)

  • list of frames corresponding to the extracted trajectories (“frames”)

  • GMM components when run_gmm=True (“gmm_components”)

classmethod renumerate_classes(classes)[source]

Helper functions for renumerating Gaussian mixture model classes for Markov transition analysis

Return type:

ndarray

transition_matrix(n_components, covariance='diag', random_state=1, rmax=10, min_length=0, sum_all_transitions=False)[source]

Applies Gaussian mixture model to a stack of local descriptors (subimages). Extracts trajectories for the detected defects starting from the first frame. Calculates transition probability for each trajectory.

Parameters:
  • n_components (int) – Number of components for Gaussian mixture model

  • covariance (str) – Type of covariance for Gaussian mixture model (‘full’, ‘diag’, ‘tied’, ‘spherical’)

  • random_state (int) – Random state instance for Gaussian mixture model

  • rmax (int) – Max allowed distance (projected on xy plane) between defect in one frame and the position of its nearest neigbor in the next one

  • min_length (int) – Minimal length of trajectory to return

Return type:

Dict

Returns:

Pyhton dictionary containing

  • List of defects/atoms trajectories (“trajectories”)

  • List of transition matrices for each trajectory (“transitions”)

  • List of frames corresponding to the extracted trajectories (“frames”)

  • GMM components as images (“gmm_components”)

atomai.stat.update_classes(coordinates, nn_input, method='threshold', **kwargs)[source]

Updates atomic classes based on the calculated intensities at each predicted position or local neighborhood analysis based on image patches cropped around each predicted position

Parameters:
  • coordinates (dict) – Output of AtomAI’s predictor. It is also possible to pass a single dictionary value associated with a specific image in a stack. In this case, the same image needs to be passed as ‘nn_input’.

  • nn_input (numpy array) – Image(s) served as an input to a neural network

  • method (str) – Method for intensity-based update of atomic classes (‘threshold’, ‘kmeans’, ‘meanshift’, ‘gmm_local’)

  • **thresh (float or int) – Intensity threshold value. Values above/below are set to 1/0

  • **r (float or int) – Size of an area around each atomic coordinate over which intensity is calculated

  • **n_components (int) – Number of components for ‘kmeans’ and ‘gmm_local’

  • **quantile (float) – Quantile for bandwidth computation in ‘meanshift’ clustering

  • **window_size (int) – Size of image patch for ‘gmm_local’

Return type:

Dict[int, ndarray]

Returns:

Updated coordinates

Image transforms

class atomai.transforms.datatransform(n_channels=None, dim_order_in='channel_last', dim_order_out='channel_first', squeeze_channels=False, seed=None, **kwargs)[source]

Applies a sequence of pre-defined operations for data augmentation.

Parameters:
  • n_channels (int) – Number of classes (channels) in the ground truth

  • dim_order_in (str) – Channel first or channel last ordering in the input masks

  • dim_order_out (str) – Channel first or channel last ordering in the output masks

  • seed (int) – Determenism

  • **custom_transform (Callable) – Python function that takes two ndarrays (images and masks) as input, applies a set of transformation to them, and returns the two transformed arrays

  • **rotation (bool) – Rotating image by +- 90 deg (if image is square) and horizontal/vertical flipping.

  • **zoom (bool or int) – Zooming-in by a specified zoom factor (Default: 2) Note that a zoom window is always square

  • **gauss_noise (bool or list ot tuple) – Gaussian noise. You can pass min and max values as a list/tuple (Default [min, max] range: [0, 50])

  • **poisson_noise (bool or list ot tuple) – Poisson noise. You can pass min and max values as a list/tuple (Default [min, max] range: [30, 40])

  • **salt_and_pepper (bool or list ot tuple) – Salt and pepper noise. You can pass min and max values as a list/tuple (Default [min, max] range: [0, 50])

  • **blur (bool or list ot tuple) – Gaussian blurring. You can pass min and max values as a list/tuple (Default [min, max] range: [1, 50])

  • **contrast (bool or list ot tuple) – Contrast level. You can pass min and max values as a list/tuple (Default [min, max] range: [5, 20])

  • **background (bool) – Adds/substracts asymmetric 2D gaussian of random width and intensity from the image

  • **resize (tuple) – Values for image resizing [downscale factor (default: 2), upscale factor (default:1.5)]

apply_gauss(X_batch, y_batch)[source]

Random application of gaussian noise to each training inage in a stack

Return type:

Tuple[ndarray]

apply_jitter(X_batch, y_batch)[source]

Random application of jitter noise to each training image in a stack

Return type:

Tuple[ndarray]

apply_poisson(X_batch, y_batch)[source]

Random application of poisson noise to each training inage in a stack

Return type:

Tuple[ndarray]

apply_sp(X_batch, y_batch)[source]

Random application of salt & pepper noise to each training inage in a stack

Return type:

Tuple[ndarray]

apply_blur(X_batch, y_batch)[source]

Random blurring of each training image in a stack

Return type:

Tuple[ndarray]

apply_contrast(X_batch, y_batch)[source]

Randomly change level of contrast of each training image on a stack

Return type:

Tuple[ndarray]

apply_zoom(X_batch, y_batch)[source]

Zoom-in achieved by cropping image and then resizing to the original size. The zooming window is a square.

Return type:

Tuple[ndarray]

apply_background(X_batch, y_batch)[source]

Emulates thickness variation in STEM or height variation in STM

Return type:

Tuple[ndarray]

apply_rotation(X_batch, y_batch)[source]

Flips and rotates training images and correponding ground truth images

Return type:

Tuple[ndarray]

apply_imresize(X_batch, y_batch)[source]

Resizes training images and corresponding ground truth images

Return type:

Tuple[ndarray]

run(images, targets)[source]

Applies a sequence of augmentation procedures to images and (except for noise) targets. Starts with user defined custom_transform if available. Then proceeds with rotation->zoom->resize->gauss->jitter->poisson->sp->blur->contrast->background. The operations that are not specified in kwargs are skipped.

Return type:

Tuple[ndarray]

Training data preparation

atomai.utils.create_lattice_mask(lattice, xy_atoms, *args, **kwargs)[source]

Given experimental image and xy atomic coordinates creates ground truth image. Currently works only for the case where all atoms are one class. Notice that it will round fractional pixels.

Parameters:
  • lattice (2D numpy array) – Experimental image as 2D numpy array

  • xy_atoms (N x 2 numpy array) – Position of atoms in the experimental data

  • *arg (python function) –

    Function that creates a 2D numpy array with atom and corresponding mask for each atomic coordinate. It must have two parameters, ‘scale’ and ‘rmask’ that control sizes of simulated atom and corresponding mask

    Example:

    >>> def create_atomic_mask(scale=7, rmask=5):
    >>>     atom = MakeAtom(r).atom2dgaussian()
    >>>     _, mask = cv2.threshold(atom, thresh, 1, cv2.THRESH_BINARY)
    >>>     return atom, mask
    

  • **scale (int) – Controls the atom size (width of 2D Gaussian)

  • **rmask (int) – Controls the atomic mask size

Return type:

ndarray

Returns:

2D numpy array with ground truth data

atomai.utils.create_multiclass_lattice_mask(imgdata, coord_class_dict, *args, **kwargs)[source]

Given a stack of experimental images and dictionary with atomic coordinates and classes creates a ground truth image. Notice that it will round fractional pixels.

Parameters:
  • lattice (3D numpy array) – Experimental image as 2D numpy array

  • coord_class_dict (dict or N x 3 numpy array) – Dictionary with arrays containing coordiantes and classes for each atom/defect In each array, the first two columns are position of atoms. The third column is the “intensity”/class of each atom. It is also possible to pass a single N x 3 ndarray, which will be wrapped into a dictioanry automatically.

  • *arg (python function) – Function that creates two 2D numpy arrays with atom and corresponding mask for each atomic coordinate. It must have three parameters, ‘scale’, ‘rmask’, and ‘intensity’ that control size and intensity of simulated atom and corresponding atomic mask

  • **scale (int) – Controls the atom size (width of 2D Gaussian)

  • **rmask (int) – Controls the atomic mask size

Return type:

Union[List[ndarray], ndarray]

Returns:

4D numpy array with ground truth data or list of 3D numpy arrays

atomai.utils.extract_patches(images, masks, patch_size, num_patches, **kwargs)[source]

Takes batch of images and batch of corresponding masks as an input and for each image-mask pair it extracts stack of subimages (patches) of the selected size.

Return type:

Tuple[ndarray]

atomai.utils.extract_random_subimages(imgdata, window_size, num_images, coordinates=None, **kwargs)[source]

Extracts randomly subimages centered at certain atom class/type (usually from a neural network output) or just at random pixels (if coordinates are not known/available)

Parameters:
  • imgdata (numpy array) – 4D stack of images (n, height, width, channel)

  • window_size (int) – Side of the square for subimage cropping

  • num_images (int) – number of images to extract from each “frame” in the stack

  • coordinates (dict) – Optional. Prediction from atomnet.locator (can be from other source but must be in the same format) Each element is a \(N \times 3\) numpy array, where N is a number of detected atoms/defects, the first 2 columns are xy coordinates and the third columns is class (starts with 0)

  • **coord_class (int) – Class of atoms/defects around around which the subimages will be cropped (3rd column in the atomnet.locator output)

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • stack of subimages

  • (x, y) coordinates of their centers

  • frame number associated with each subimage

atomai.utils.extract_subimages(imgdata, coordinates, window_size, coord_class=0)[source]

Extracts subimages centered at certain atom class/type (usually from a neural network output)

Parameters:
  • imgdata (numpy array) – 4D stack of images (n, height, width, channel). It is also possible to pass a single 2D image.

  • coordinates (dict or N x 2 numpy arry) – Prediction from atomnet.locator (can be from other source but must be in the same format) Each element is a \(N \times 3\) numpy array, where N is a number of detected atoms/defects, the first 2 columns are xy coordinates and the third columns is class (starts with 0). It is also possible to pass N x 2 numpy array if the corresponding imgdata is a single 2D image.

  • window_size (int) – Side of the square for subimage cropping

  • coord_class (int) – Class of atoms/defects around around which the subimages will be cropped (3rd column in the atomnet.locator output)

Return type:

Tuple[ndarray]

Returns:

3-element tuple containing

  • stack of subimages,

  • (x, y) coordinates of their centers,

  • frame number associated with each subimage

class atomai.utils.MakeAtom(sc=5, r_mask=3, intensity=1, theta=0, offset=0)[source]

Creates an image of atom modelled as 2D Gaussian and a corresponding mask

atom2dgaussian()[source]

Models atom as 2d Gaussian

Return type:

ndarray

circularmask(image, radius)[source]

Returns a mask with specified radius

Return type:

ndarray

gen_atom_mask()[source]

Creates a mask for specific type of atom

Return type:

Tuple[ndarray]

atomai.utils.FFTmask(imgsrc, maskratio=10)[source]

Takes a square real space image and filter out a disk with radius equal to: 1/maskratio * image size. Retruns FFT transform of the image and the filtered FFT transform

Return type:

Tuple[ndarray]

atomai.utils.FFTsub(imgsrc, imgfft)[source]

Takes real space image and filtred FFT. Reconstructs real space image and subtracts it from the original. Returns normalized image.

Return type:

ndarray

atomai.utils.threshImg(diff, threshL=0.25, threshH=0.75)[source]

Takes in difference image, low and high thresold values, and outputs a map of all defects.

Return type:

ndarray

Image pre/post processing

atomai.utils.torch_format(image_data)[source]

Reshapes (if needed), normalizes and converts image data to pytorch format for model training and prediction

Parameters:

image_data (3D or 4D numpy array) – Image stack with dimensions (n_batches x height x width) or (n_batches x 1 x height x width)

Return type:

Tensor

atomai.utils.img_resize(image_data, rs, round_=False)[source]

Resizes a stack of images

Parameters:
  • image_data (3D numpy array) – Image stack with dimensions (n_batches x height x width)

  • rs (tuple) – Target height and width

  • round (bool) – rounding (in case of labeled pixels)

Return type:

ndarray

Returns:

Resized stack of images

atomai.utils.img_pad(image_data, pooling)[source]

Pads the image if its size (w, h) is not divisible by \(2^n\), where n is a number of pooling layers in a network

Parameters:
  • image_data (3D numpy array) – Image stack with dimensions (n_batches x height x width)

  • pooling (int) – Downsampling factor (equal to \(2^n\), where n is a number of pooling operations)

Return type:

ndarray

atomai.utils.crop_borders(imgdata, thresh=0)[source]

Crops image border where all values are zeros

Parameters:
  • imgdata (numpy array) – 3D numpy array (h, w, c)

  • thresh (float) – border values to crop

Return type:

ndarray

Returns: Cropped array

atomai.utils.filter_cells(imgdata, im_thresh=0.5, blob_thresh=50, filter_='below')[source]

Filters blobs above/below certain size for each image in the stack. The ‘imgdata’ must have dimensions (n x h x w).

Parameters:
  • imgdata (3D numpy array) – stack of images (without channel dimension)

  • im_thresh (float) – value at which each image in the stack will be thresholded

  • blob_thresh (int) – maximum/mimimun blob size for thresholding

  • filter (string) – Select ‘above’ or ‘below’ to remove larger or smaller blobs, respectively

Return type:

ndarray

Returns:

Image stack with the same dimensions as the input data

atomai.utils.get_blob_params(nn_output, im_thresh, blob_thresh, filter_='below')[source]

Extracts position and angle of particles in each movie frame

Parameters:
  • nn_output (4D numpy array) – out of neural network returned by atomnet.predictor

  • im_thresh (float) – value at which each image in the stack will be thresholded

  • blob_thresh (int) – maximum/mimimun blob size for thresholding

  • filter (string) – Select ‘above’ or ‘below’ to remove larger or smaller blobs, respectively

Return type:

Dict

Returns:

Nested dictionary where for each frame there is an ordered dictionary with values of centers of the mass and angle for each detected particle in that frame.

atomai.utils.cv_thresh(imgdata, threshold=0.5)[source]

Wrapper for opencv binary threshold method. Returns thresholded image.

atomai.utils.cv_resize(img, rs, round_=False)[source]

Wrapper for open-cv resize function

Parameters:
  • img (2D numpy array) – input 2D image

  • rs (tuple) – target height and width

  • round (bool) – rounding (in case of labeled pixels)

Return type:

ndarray

Returns:

Resized image

atomai.utils.cv_resize_stack(imgdata, rs, round_=False)[source]

Resizes a 3D stack of images

Parameters:
  • imgdata (3D numpy array) – stack of 3D images to be resized

  • rs (tuple or int) – target height and width

  • round (bool) – rounding (in case of labeled pixels)

Return type:

ndarray

Returns:

Resized image

Atomic Coordinates

atomai.utils.map_bonds(coordinates, nn=2, upper_bound=None, distance_ideal=None, plot_results=True, **kwargs)[source]

Generates plots with lattice bonds (color-coded according to the variation in their length)

Parameters:
  • coordinates (dict) – Dictionary where keys are frame numbers and values are \(N \times 3\) numpy arrays with atomic coordinates. In each array the first two columns are xy coordinates and the third column is atom class.

  • nn (int) – Number of nearest neighbors to search for.

  • upper_bound (float or int, non-negative) – Upper distance bound (in px) for Query the kd-tree for nearest neighbors. Only distances below this value will be counted.

  • distance_ideal (float) – Bond distance in ideal lattice. Defaults to average distance in the frame

  • plot_results (bool) – Plot bond maps

  • **savedir (str) – directory to save plots

  • **h (int) – image height

  • **w (int) – image width

Return type:

ndarray

Returns:

Array of distances to nearest neighbors for each atom

atomai.utils.find_com(image_data)[source]

Find atoms via center of mass methods

Parameters:

image_data (2D numpy array) – 2D image (usually an output of neural network)

Return type:

ndarray

atomai.utils.get_nn_distances(coordinates, nn=2, upper_bound=None)[source]

Calculates nearest-neighbor distances for a stack of images

Parameters:
  • coordinates (Union[Dict[int, ndarray], ndarray]) – Dictionary where keys are frame numbers and values are \(N \times 3\) numpy arrays with atomic coordinates. In each array the first two columns are xy coordinates and the third column is atom class. One can also pass a single numpy array (if all the coordiantes correspond to a single image)

  • nn (int) – Number of nearest neighbors to search for.

  • upper_bound (float or int, non-negative) – Upper distance bound for Query the kd-tree for nearest neighbors. Only distances below this value will be counted.

Return type:

Tuple[List[ndarray]]

Returns:

Tuple with list of \(atoms \times nn\) arrays of distances to nearest neighbors and list of \(atoms \times (nn+1) \times 3\) array of coordinates (including coordinates of the “center” atom), where n_atoms is less or equal to the total number of atoms in the ‘coordinates’ (due to ‘upper_bound’ criterion)

atomai.utils.get_nn_distances_(coordinates, nn=2, upper_bound=None)[source]

Calculates nearest-neighbor distances for a single image

Parameters:
  • coordinates (numpy array) – \(N \times 3\) array with atomic coordinates where first two columns are xy coordinates and the third column is atom class

  • nn (int) – Number of nearest neighbors to search for.

  • upper_bound (float or int, non-negative) – Upper distance bound for Query the kd-tree for nearest neighbors. Only di

Return type:

Tuple[ndarray]

Returns:

Tuple with \(atoms \times nn\) array of distances to nearest neighbors and \(atoms \times (nn+1) \times 3\) array of coordinates (including coordinates of the “center” atom), where n_atoms is less or equal to the total number of atoms in the ‘coordinates’ (due to ‘upper_bound’ criterion)

atomai.utils.peak_refinement(imgdata, coordinates, d=None)[source]

Performs a refinement of atomic postitions by fitting 2d Gaussian where the neural network predictions serve as initial guess.

Parameters:
  • imgdata (2D numpy array) – Single experimental image/frame

  • coordinates (N x 3 numpy array) – Atomic coordinates where first two columns are xy coordinates and the third column is atom class

  • d (int) – Half-side of a square around the identified atom for peak fitting If d is not specified, it is set to 1/4 of average nearest neighbor distance in the lattice.

Return type:

ndarray

Returns:

Refined array of coordinates

atomai.utils.compare_coordinates(coordinates1, coordinates2, d_max, plot_results=False, **kwargs)[source]

Finds difference between predicted (‘coordinates1’) and “true” (‘coordinates2’) coordinates using scipy.spatialcKDTree method. Use ‘d_max’ to set maximum search radius. For plotting, pass figure size and experimental image using keyword arguments ‘fsize’ and ‘expdata’.

Return type:

Tuple[ndarray]

Visualization

atomai.utils.plot_trajectories(traj, frames, **kwargs)[source]

Plots individual trajectory (as position (radius) vector)

Parameters:
  • traj (n x 3 ndarray) – numpy array where first two columns are coordinates and the 3rd columd are classes

  • frames ((n,) ndarray) – numpy array with frame numbers

  • **lv (int) – latent variable value to visualize (Default: 1)

  • **fov (int or list) – field of view or scan size

  • **fsize (int) – figure size (Default: 6)

  • **cmap (str) – colormap (Default: jet)

Return type:

None

atomai.utils.plot_transitions(matrix, states=None, gmm_components=None, plot_values=False, **kwargs)[source]

Plots transition matrix and (optionally) most frequent/probable transitions

Parameters:
  • m (2D numpy array) – Transition matrix

  • states (numpy array) – Array with states (e.g. [2, 5, 7])

  • gmm_components (4D numpy array) – GMM components (optional)

  • plot_values (bool) – Show calculated transtion rates

  • **transitions_to_plot (int) – number of transitions (associated with largest prob values) to plot

  • **plot_toself (bool) – Skips transitions into self when plotting transitions with largest probs

  • **fsize (int) – figure size

  • **cmap (str) – color map

Return type:

None

atomai.utils.plot_trajectories_transitions(trans_dict, k, plot_values=False, **kwargs)[source]

Plots trajectory witht he associated transitions.

Parameters:
  • trans_dict (dict) – Python dictionary containing trajectories, frame numbers, transitions and the averaged GMM components. Usually this is an output of atomstat.transition_matrix

  • k (int) – Number of trajectory to vizualize

  • plot_values (bool) – Show calculated transtion rates

  • **transitions_to_plot (int) – number of transitions (associated with largerst prob values) to plot

  • **fsize (int) – figure size

  • **cmap (str) – color map

  • **fov (int or list) – field of view (scan size)

Return type:

None

ASE utilities

atomai.utils.ase_obj_basic(coords_dict, frame_number, material_system, map_dict, filepath, px2ang)[source]

Takes the atomic coordinates and classes predicted by AtomAI’s Segmentor models and uses them to create a structure file readable by packages such as Atomic Simulation Environment (ASE), VESTA, etc. It uses a simple cubic cell.

Parameters:
  • coords_dict (Union[Dict[int, ndarray], ndarray]) – dictionary object of coordinates produced by AtomAI

  • frame_number (int) – image frame number (assumes a stack of images)

  • material_system (str) – name of material

  • map_dict (Dict[int, str]) – dictionary which maps atomic classes from the NN output (keys) to strings corresponding to chemical elements (values)

  • filepath (str) – Savepath for the ASE object

  • px2ang (float) – Pixels-to-angstroms conversion coefficient, which is specific to each experiment

Return type:

None

Examples:

>>> # Save coordinates for specific frame (0) as ASE object
>>> ase_obj_basic(coordinates, 0, "Graphene",
>>>               map_dict = {0: "C", 1: "Si"},
>>>               "/content/Drive/POSCAR",
>>>               px2ang=0.104)
>>> # Read the saved files using ASE reader
>>> cell = ase.io.vasp.read_vasp("/content/Drive/POSCAR")
atomai.utils.ase_obj_adv(a_lattice, b_lattice, c_lattice, coords_dict, frame_number, material_system, map_dict, filepath, px2ang)[source]

Takes the atomic coordinates and classes predicted by AtomAI’s Segmentor models and uses them to create a structure file readable by packages such as Atomic Simulation Environment (ASE), VESTA, etc. It uses a customized cell with multiple atoms per user’s choice.

Parameters:
  • a_lattice (float) – list of lattice vector in a direction ([a1,a2,a3])

  • b_lattice (float) – list of lattice vector in a direction ([b1,b2,c3])

  • c_lattice (float) – list of lattice vector in a direction ([c1,c2,c3])

  • coords_dict (Union[Dict[int, ndarray], ndarray]) – dictionary object of coordinates produced by AtomAI

  • frame_number (int) – image frame number

  • material_system (str) – name of material

  • map_dict (Dict[int, str]) – dictionary which maps atomic classes from the NN output (keys) to strings corresponding to chemical elements (values)

  • filepath (str) – Savepath for the ASE object

  • px2ang (float) – Pixels-to-Angstrom conversion coefficient, which is specific to each experiment

Return type:

None

Examples:

>>> # Save coordinates for specific frame (0) as ASE object
>>> ase_obj_adv([86.00000,0.00000,0.00000],
>>>             [0.00000,86.00000,0.00000],
>>>             [0.00000,0.00000,86.00000], coordinates, 0,
>>>             "Graphene", map_dict = {0: "C", 1: "Si"},
>>>             "/content/Drive/POSCAR_adv",
>>>             px2ang=0.104)
>>> # Read the saved file using ASE reader
>>> cell = ase.io.vasp.read_vasp("/content/Drive/POSCAR_adv")

Datasets

atomai.utils.datasets.stem_smbfo(download=True, filedir='./')[source]

Downloads the scanning transmission electron microscopy (STEM) datasets from the combinatorial library of the Sm-doped BiFeO3 (BFO) grown to cover the composition range from pure ferroelectric BFO to orthorhombic 20% Sm-doped BFO. For details, see npj Comput Mater 6, 127 (2020). https://doi.org/10.1038/s41524-020-00396-2.

Parameters:
  • download (bool) – downloads the dataset from the public repository

  • filedir (str) – directory to save the downloaded data

Return type:

Dict[str, Dict[str, ndarray]]

Returns:

Nested dictionary where the 1st dictionary describes different Sm concentrations, and the 2nd dictionary has chemical and physical descriptors for each concentration.

Examples

>>> # Download the dataset
>>> dataset = atomai.utils.datasets.stem_smbfo()
>>> # Plot main image and polarization values for each Sm concentration
>>> for k, d in dataset.items():
>>>     _, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))
>>>     y, x = d["xy_COM"].T  # get the center of the mass for each unit cell
>>>     ax1.imshow(d["main_image"], origin="lower", cmap='gray')
>>>     ax1.set_title(k)
>>>     ax2.scatter(x, y, c=d["Pxy"][:, 0], s=3, cmap='RdBu_r')
>>>     ax3.scatter(x, y, c=d["Pxy"][:, 1], s=3, cmap='RdBu_r')
>>>     plt.show()
atomai.utils.datasets.stem_graphene(download=True, filedir='./')[source]

Downloads the scanning transmission electron microscopy (STEM) datasets from the graphene samples. See https://doi.ccs.ornl.gov/ui/doi/338 for details.

Parameters:
  • download (bool) – downloads the dataset from the public repository

  • filedir (str) – directory to save the downloaded data

Return type:

Dict[int, Dict[str, Union[ndarray, Dict]]]

Returns:

Nested dictionary with STEM movies in the form of M x N x L arrays and the corresponding metadata

Examples

>>> # Download the dataset
>>> dataset = atomai.utils.datasets.stem_graphene()
>>> # Get one STEM movie and the associated metadata
>>> data = dataset[3]["image_data"]  # ndarray of size (50, 1024, 1024)
>>> metadata = dataset[3]["metadata] # dictionary with experimental parameters