moseq2_pca.pca package

PCA - Utilties Module

Utility functions for PCA.

moseq2_pca.pca.util.apply_pca_dask(pca_components, h5s, yamls, use_fft, clean_params, save_file, chunk_size, mask_params, missing_data, client, fps=30, h5_path='/frames', h5_mask_path='/frames_mask', verbose=False)

Project input frame data by the transpose of the given PCs to obtain PCA Scores using distributed Dask cluster.

Args: pca_components (numpy.array): array of computed Principal Components h5s (list): list of h5 files yamls (list): list of yaml files use_fft (bool): indicate whether to use 2D-FFT clean_params (dict): dictionary containing filtering options save_file (str): path to pca_scores filename to save chunk_size (int): size of chunks to process mask_params (dict): dictionary of masking parameters (if missing data) missing_data (bool): indicates whether to use mask arrays. fps (int): frames per second h5_path (str): path to frames within selected h5 file (default: ‘/frames’) h5_mask_path (str): path to masked frames within selected h5 file (default: ‘/frames_mask’) verbose (bool): print session names as they are being loaded.

Returns:

moseq2_pca.pca.util.apply_pca_local(pca_components, h5s, yamls, use_fft, clean_params, save_file, chunk_size, mask_params, missing_data, fps=30, h5_path='/frames', h5_mask_path='/frames_mask', verbose=False)

Project the input frame data by the transpose of the given PCs to obtain PCA Scores using local cluster/platform.

Args: pca_components (numpy.array): array of computed Principal Components h5s (list): list of h5 files yamls (list): list of yaml files use_fft (bool): indicate whether to use 2D-FFT clean_params (dict): dictionary containing filtering options save_file (str): path to pca_scores filename to save chunk_size (int): size of chunks to process mask_params (dict): dictionary of masking parameters (if missing data) missing_data (bool): indicates whether to use mask arrays. fps (int): frames per second h5_path (str): path to frames within selected h5 file (default: ‘/frames’) h5_mask_path (str): path to masked frames within selected h5 file (default: ‘/frames_mask’) verbose (bool): print session names as they are being loaded.

Returns:

moseq2_pca.pca.util.compute_explained_variance(s, nsamples, total_var)

Compute the explained variance and explained variance ratio contributed by each computed Principal Component.

Args: s (numpy.array): computed singular values. nsamples (int): number of included samples. total_var (float): total variance captured by principal components.

Returns: explained_variance (numpy.array): list of floats denoting the explained variance per PC. explained_variance_ratio (numpy.array): list of floats denoting the explained variance ratios per PC.

moseq2_pca.pca.util.compute_svd(dask_array, mean, rank, iters, missing_data, mask, recon_pcs, min_height, max_height, client)

Runs Singular Vector Decomposition on the inputted frames. If missing_data == True, use missing data PCA.

Args: dask_array (dask 2d-array): Reshaped input data array of shape (nframes x nfeatures) mean (numpy.array): Means of each row in dask_array. rank (int): Rank of the desired thin SVD decomposition. iters (int): Number of SVD iterations missing_data (bool): Indicates whether to compute SVD with a masked array mask (dask 2d-array): None if missing_data == False, else mask array of shape dask_array recon_pcs (int): Number of PCs to reconstruct for missing data. min_height (int): Minimum height of mouse above the ground, used to filter reconstructed PCs. max_height (int): Maximum height of mouse above the ground, used to filter reconstructed PCs. client (dask Client): Dask client to process batches.

Returns: s (numpy.array): computed singular values (eigen-values). v (numpy.ndarray): computed principal components (eigen-vectors). mean (numpy.array): updated mean of dask array if missing_data == True. total_var (float): total variance captured by principal components.

moseq2_pca.pca.util.copy_metadatas_to_scores(f, f_scores, uuid)

Copy metadata from individual session extract h5 files to the PCA scores h5 file.

Args: f (read-open h5py File): open “results_00.h5” h5py.File object in read-mode f_scores (read-open h5py File): open “pca_scores.h5” h5py.File object in read-mode uuid (str): uuid of inputted session h5 “f”.

moseq2_pca.pca.util.get_changepoints_dask(changepoint_params, pca_components, h5s, yamls, save_file, chunk_size, mask_params, missing_data, client, fps=30, pca_scores=None, progress_bar=False, h5_path='/frames', h5_mask_path='/frames_mask', verbose=False)

Compute model-free changepoint block durations using random projections.

Args: changepoint_params (dict): dict of changepoint parameters pca_components (numpy.array): computed principal components h5s (list): list of h5 files yamls (list): list of yaml files save_file (str): path to save changepoint files chunk_size (int): size of chunks to process in dask. mask_params (dict): dict of missing_data mask parameters. missing_data (bool): indicate whether to use mask_params client (dask Client): initialized Dask Client object fps (int): frames per second pca_scores (numpy.array): computed principal component scores progress_bar (bool): display progress bar h5_path (str): path to frames within selected h5 file (default: ‘/frames’) h5_mask_path (str): path to masked frames within selected h5 file (default: ‘/frames_mask’) verbose (bool): print session names as they are being loaded.

Returns:

moseq2_pca.pca.util.get_timestamps(f, frames, fps=30)

Read the timestamps from a given h5 file.

Args: f (read-open h5py File): open “results_00.h5” h5py.File object in read-mode frames (numpy.ndarray): list of 2d frames contained in opened h5 File. fps (int): frames per second.

Returns: timestamps (numpy.array): array of timestamps for inputted frames variable

moseq2_pca.pca.util.mask_data(original_data, mask, new_data)

Create a mask subregion given a boolean mask if missing data flag is used.

Args: original_data (numpy.ndarray): input frames mask (numpy.ndarray): mask array new_data (numpy.ndarray): frames to use

Returns: output (numpy.ndarray): masked data array

moseq2_pca.pca.util.train_pca_dask(dask_array, clean_params, use_fft, rank, cluster_type, client, mask=None, iters=10, recon_pcs=10, min_height=10, max_height=100)

Train PCA using dask arrays.

Args: dask_array (dask array): chunked frames to train PCA clean_params (dict): dictionary containing filtering parameters use_fft (bool): indicates whether to use 2d-FFT on images. rank (int): Matrix rank to use cluster_type (str): indicates which cluster to use. client (Dask.Client): client object to execute dask operations mask (dask array): dask array of masked data if missing_data parameter==True iters (int): number of SVD iterations recon_pcs (int): number of PCs to reconstruct. (if missing_data = True) min_height (int): minimum mouse height from floor in (mm) max_height (int): maximum mouse height from floor in (mm)

Returns: output_dict (dict): dictionary containing PCA training results.