moseq2_pca package

CLI Module

moseq2-pca

moseq2-pca [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

Default:

False

apply-pca

Compute PCA Scores of extraction data given a pre-trained PCA

moseq2-pca apply-pca [OPTIONS]

Options

--chunk-size <chunk_size>

Number of frames per chunk

Default:

4000

--h5-mask-path <h5_mask_path>

Path to log-likelihood mask in h5 files

Default:

/frames_mask

--h5-path <h5_path>

Path to data in h5 files

Default:

/frames

--config-file <config_file>

Path to configuration file

-o, --output-dir <output_dir>

Directory to store PCA results

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

-i, --input-dir <input_dir>

Directory to find extracted h5 files

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs

--cluster-type <cluster_type>

Compute enviornment the command runs in

Default:

local

Options:

local | slurm | nodask

--timeout <timeout>

Time to wait for workers to initialize before proceeding (minutes)

Default:

5

-w, --wall-time <wall_time>

Wall time (compute time) for workers

Default:

06:00:00

-m, --memory <memory>

Total RAM usage per worker

Default:

15GB

-p, --processes <processes>

Number of processes to run on each worker

Default:

1

-c, --cores <cores>

Number of cores per worker

Default:

1

-n, --nworkers <nworkers>

Number of workers

Default:

1

-q, --queue <queue>

Cluster queue/partition for submitting jobs

Default:

debug

--dask-port <dask_port>

Port to access dask dashboard

Default:

8787

-d, --dask-cache-path <dask_cache_path>

Path to spill data to disk for dask

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

--output-file <output_file>

Name of h5 file for storing pca results

Default:

pca_scores

--pca-path <pca_path>

Path to pca components in h5 file

Default:

/components

--pca-file <pca_file>

Path to PCA results

--fill-gaps <fill_gaps>

Fill dropped frames with nans

Default:

True

--fps <fps>

Frames per second (frame rate)

Default:

30

--detrend-window <detrend_window>

Length of detrend window (in seconds, 0 for no detrending)

Default:

0

-v, --verbose

Print sessions as they are being loaded.

Default:

False

--overwrite-pca-apply <overwrite_pca_apply>

Used to bypass the pca overwrite question. If True: skip question, run automatically

Default:

False

clip-scores

Clip specified number of frames from PCA scores at the beginning or end

moseq2-pca clip-scores [OPTIONS] PCA_FILE CLIP_SAMPLES

Options

--from-end

if true clip from end rather than beginning

Default:

False

Arguments

PCA_FILE

Required argument

CLIP_SAMPLES

Required argument

compute-changepoints

Compute the Model-Free Syllable Changepoints based on the PCA/PCA_Scores

moseq2-pca compute-changepoints [OPTIONS]

Options

--chunk-size <chunk_size>

Number of frames per chunk

Default:

4000

--h5-mask-path <h5_mask_path>

Path to log-likelihood mask in h5 files

Default:

/frames_mask

--h5-path <h5_path>

Path to data in h5 files

Default:

/frames

--config-file <config_file>

Path to configuration file

-o, --output-dir <output_dir>

Directory to store PCA results

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

-i, --input-dir <input_dir>

Directory to find extracted h5 files

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs

--cluster-type <cluster_type>

Compute enviornment the command runs in

Default:

local

Options:

local | slurm | nodask

--timeout <timeout>

Time to wait for workers to initialize before proceeding (minutes)

Default:

5

-w, --wall-time <wall_time>

Wall time (compute time) for workers

Default:

06:00:00

-m, --memory <memory>

Total RAM usage per worker

Default:

15GB

-p, --processes <processes>

Number of processes to run on each worker

Default:

1

-c, --cores <cores>

Number of cores per worker

Default:

1

-n, --nworkers <nworkers>

Number of workers

Default:

1

-q, --queue <queue>

Cluster queue/partition for submitting jobs

Default:

debug

--dask-port <dask_port>

Port to access dask dashboard

Default:

8787

-d, --dask-cache-path <dask_cache_path>

Path to spill data to disk for dask

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

--output-file <output_file>

Name of h5 file for storing pca results

Default:

changepoints

--pca-file-components <pca_file_components>

Path to PCA components

--pca-file-scores <pca_file_scores>

Path to PCA results

--pca-path <pca_path>

Path to pca components

Default:

/components

--neighbors <neighbors>

Neighbors to use for peak identification

Default:

1

--threshold <threshold>

Peak threshold to use for changepoints

Default:

0.5

-k, --klags <klags>

Lag to use for derivative calculation

Default:

6

-s, --sigma <sigma>

Standard deviation of gaussian smoothing filter

Default:

3.5

-d, --dims <dims>

Number of random projections to use

Default:

300

--fps <fps>

Frames per second (frame rate)

Default:

30

-v, --verbose

Print sessions as they are being loaded.

Default:

False

train-pca

Train PCA on all extracted results (h5 files) in input directory

moseq2-pca train-pca [OPTIONS]

Options

--chunk-size <chunk_size>

Number of frames per chunk

Default:

4000

--h5-mask-path <h5_mask_path>

Path to log-likelihood mask in h5 files

Default:

/frames_mask

--h5-path <h5_path>

Path to data in h5 files

Default:

/frames

--config-file <config_file>

Path to configuration file

-o, --output-dir <output_dir>

Directory to store PCA results

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

-i, --input-dir <input_dir>

Directory to find extracted h5 files

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs

--cluster-type <cluster_type>

Compute enviornment the command runs in

Default:

local

Options:

local | slurm | nodask

--timeout <timeout>

Time to wait for workers to initialize before proceeding (minutes)

Default:

5

-w, --wall-time <wall_time>

Wall time (compute time) for workers

Default:

06:00:00

-m, --memory <memory>

Total RAM usage per worker

Default:

15GB

-p, --processes <processes>

Number of processes to run on each worker

Default:

1

-c, --cores <cores>

Number of cores per worker

Default:

1

-n, --nworkers <nworkers>

Number of workers

Default:

1

-q, --queue <queue>

Cluster queue/partition for submitting jobs

Default:

debug

--dask-port <dask_port>

Port to access dask dashboard

Default:

8787

-d, --dask-cache-path <dask_cache_path>

Path to spill data to disk for dask

Default:

/home/wingillis/dev/moseq/moseq2-pca/docs/_pca

--gaussfilter-space <gaussfilter_space>

x, y sigma for kernel in Spatial filter for data (Gaussian)

Default:

1.5, 1

--gaussfilter-time <gaussfilter_time>

sigma for temporal filter for data (Gaussian)

Default:

0

--medfilter-space <medfilter_space>

kernel size for median spatial filter

Default:

0

--medfilter-time <medfilter_time>

kernel size for median temporal filter

Default:

0

--missing-data

Use missing data PCA; will be automatically set to True if cable-filter-iters > 1 from the extract step.

Default:

False

--missing-data-iters <missing_data_iters>

number of missing data PCA iterations

Default:

10

--mask-threshold <mask_threshold>

Threshold for mask (missing data PCA only)

Default:

-16

--mask-height-threshold <mask_height_threshold>

Threshold for mask based on floor height

Default:

5

--min-height <min_height>

Min mouse height from floor (mm)

Default:

10

--max-height <max_height>

Max mouse height from floor (mm)

Default:

120

--tailfilter-size <tailfilter_size>

Tail filter size

Default:

9, 9

--tailfilter-shape <tailfilter_shape>

Tail filter shape

Default:

ellipse

--use-fft

Use 2D fft

Default:

False

--train-on-subset <train_on_subset>

The fraction of the total frames the PCA is trained on; default PCA is trained on all frames

Default:

1

--recon-pcs <recon_pcs>

Number of PCs to use for missing data reconstruction

Default:

10

--rank <rank>

Rank for compressed SVD

Default:

25

--output-file <output_file>

Name of h5 file for storing pca results

Default:

pca

--local-processes <local_processes>

Used with a local cluster. If True: use processes, If False: use threads

Default:

False

--overwrite-pca-train <overwrite_pca_train>

Used to bypass the pca overwrite question. If True: skip question, run automatically

Default:

False

--camera-type <camera_type>

specify the camera type (k2 or azure), default is k2

Default:

k2

GUI Module

GUI front-end operations for PCA.

moseq2_pca.gui.apply_pca_command(progress_paths, output_file)

Compute PCA Scores given trained PCA using Jupyter Notebook.

Args: progress_paths (dict): dictionary containing notebook progress paths output_file (str): name of output pca file.

Returns: (str): success string.

moseq2_pca.gui.compute_changepoints_command(input_dir, progress_paths, output_file)

Compute Changepoint distribution using Jupyter Notebook.

Args: input_dir (str): path to directory containing training data progress_paths (dict): dictionary containing notebook progress paths output_file (str): name of output pca file.

Returns: (str): success string.

moseq2_pca.gui.train_pca_command(progress_paths, output_dir, output_file)

Train PCA through Jupyter notebook, and updates config file.

Args: progress_paths (dict): dictionary containing notebook progress paths output_dir (str): path to output pca directory output_file (str): name of output pca file.

Returns:

Utilities Module

Utility and helper functions for finding and reading files, filtering operations, Dask initialization, and changepoint helper functions.

moseq2_pca.util.check_timestamps(h5s)

Helper function to determine whether timestamps and/or metadata is missing from extracted files. Function will emit a warning if either pieces of data are missing.

Args: h5s (list): List of paths to all extracted h5 files.

moseq2_pca.util.clean_frames(frames, medfilter_space=None, gaussfilter_space=None, medfilter_time=None, gaussfilter_time=None, detrend_time=None, tailfilter=None, tail_threshold=5)

Filter spatial/temporal noise from frames using Median and Gaussian filters, given kernel sizes for each respective requested filter.

Args: frames (numpy.ndarray): frames to filter. medfilter_space (list): median spatial filter kernel. gaussfilter_space (list): gaussian spatial filter kernel. medfilter_time (list): median temporal filter. gaussfilter_time (list): gaussian temporal filter. detrend_time (int): number of frames to lag for. tailfilter (int): size of tail-filter kernel. tail_threshold (int): threshold value to use for tail filtering

Returns: out (numpy.ndarray): filtered frames.

moseq2_pca.util.close_dask(client, cluster, timeout)

Shut down the Dask client and cluster, and dump all cache data.

Args: client (Dask Client): Client object cluster (dask Cluster): initialized Cluster timeout (int): Time to wait for client to close gracefully (minutes)

Returns:

moseq2_pca.util.combine_new_config(config_file, config_data)

Read config file and combine new config params with it

Args:

config_file (str): path to config.yaml config_data (dict): dictionary of config data

moseq2_pca.util.command_with_config(config_file_param_name)

Helper function to assign variables from a config file. Hierachy of CLI prameters: params from cli options > params from config_file > default params

Args: config_file_param_name (str): parameter name to update with config file variable.

Returns: custom_command_class (click.Command): updated Click Command containing parameters from inputted config file.

moseq2_pca.util.gauss_smooth(signal, win_length=None, sig=1.5, kernel=None)

Perform Gaussian Smoothing on a 1D signal.

Args: signal (1d numpy array): signal to perform smoothing win_length (int): window size for gaussian kernel filter sig (float): variance of 1d gaussian kernel. kernel (tuple): kernel size to use for smoothing

Returns: result (1d numpy array): smoothed signal

moseq2_pca.util.gaussian_kernel1d(n=None, sig=3)

Get 1D gaussian kernel.

Args: n (int): window size. sig (int): variance of kernel to use.

Returns: kernel (1d array): 1D numpy kernel.

moseq2_pca.util.get_changepoints(scores, k=5, sigma=3, peak_height=0.5, peak_neighbors=1, baseline=True, timestamps=None)

Compute changepoints and its corresponding distribution. Changepoints describe the magnitude of frame-to-frame changes of mouse pose.

Args: scores (numpy.ndarray): nframes * rows * columns k (int): klags - Lag to use for derivative calculation. sigma (int): Standard deviation of gaussian smoothing filter. peak_height (float): user-defined peak Changepoint length. peak_neighbors (int): number of peaks in the CP curve. baseline (bool): normalize data. timestamps (numpy.array): loaded timestamps.

Returns: cps (numpy.ndarray): array of changepoint values normed_df (numpy.array): array of values for bar plot

moseq2_pca.util.get_env_cpu_and_mem()

Read current system environment and return the amount of available memory and CPUs to allocate to the created cluster.

Returns: mem (float): Optimal number of memory (in bytes) to allocate to initialized dask cluster cpu (int): Optimal number of CPUs to allocate to dask

moseq2_pca.util.get_metadata_path(h5file)

Return path within h5 file that contains the kinect extraction metadata.

Args: h5file (str): path to h5 file.

Returns: (str): path to acquistion metadata within h5 file.

moseq2_pca.util.get_rps(frames, rps=600, normalize=True)

Get random projections of frames.

Args: frames (numpy.array): Frames to get dimensions from. rps (int): Number of random projections. normalize (bool): indicates whether to normalize the random projections.

Returns: rproj (2D or 3D numpy array): Computed random projections with same shape as frames

moseq2_pca.util.get_timestamp_path(h5file)

Return path within h5 file that contains the kinect timestamps

Args: h5file (str): path to h5 file.

Returns: (str): path to metadata timestamps within h5 file

moseq2_pca.util.h5_to_dict(h5file, path)

Read all contents from h5 and returns them in a nested dict object.

Args: h5file (str): path to h5 file path (str): path to group within h5 file

Returns: ans (dict): dictionary of all h5 group contents

moseq2_pca.util.initialize_dask(nworkers=50, processes=1, memory='4GB', cores=1, wall_time='01:00:00', queue='debug', local_processes=False, cluster_type='local', timeout=10, cache_path='/home/wingillis/moseq2_pca', dashboard_port='8787', data_size=None, **kwargs)

Initialize dask client, cluster, workers, etc.

Args: nworkers (int): number of dask workers to initialize processes (int): number of processes per worker memory (str): amount of memory to allocate to dask cluster cores (int): number of cores to use. wall_time (str): amount of time to allow program to run queue (str): logging mode local_processes (bool): flag to use processes or threads when using a local cluster cluster_type (str): indicate what cluster to use (local or slurm) timeout (int): how many minutes to wait for workers to initialize cache_path (str or Pathlike): path to store cached data dashboard_port (str): port number to find dask statistics data_size (float): size of the dask array in number of bytes. kwargs: extra keyward arguments

Returns: client (dask Client): initialized Client cluster (dask Cluster): initialized Cluster workers (dask Workers): intialized workers

moseq2_pca.util.insert_nans(timestamps, data, fps=30)

Fill NaN values with 0 in given 1D timestamps array. Used to handle dropped frames from the video acquisition.

Args: timestamps (numpy.array): timestamp values data (numpy.array): additional data to fill with NaN values - can be PC scores fps (int): frames per second

Returns: filled_data (numpy.array): filled missing timestamp values. data_idx (numpy.array): indices of inserted 0s filled_timestamps (numpy.array): filled timestamp-strs

moseq2_pca.util.read_yaml(yaml_file)

Read yaml file and return dictionary representation of file contents.

Args: yaml_file (str): path to yaml file

Returns: return_dict (dict): dict of yaml file contents

moseq2_pca.util.recursive_find_h5s(root_dir='/home/wingillis/dev/moseq/moseq2-pca/docs', ext='.h5', yaml_string='{}.yaml')

Recursively find h5 files, along with yaml files with the same basename

Args: root_dir (str): path to base directory to begin recursive search in. ext (str): extension to search for yaml_string (str): string for filename formatting when saving data

Returns: h5s (list): list of found h5 files dicts (list): list of found metadata files yamls (list): list of found yaml files

moseq2_pca.util.select_strel(string='e', size=(10, 10))

Select Structuring Element Shape. Accepts shapes (‘ellipse’, ‘rectangle’), if neither are given then ‘ellipse’ is used.

Args: string (str): e for Ellipse, r for Rectangle size (tuple): size of StructuringElement

Returns: strel (cv2.StructuringElement): StructuringElement with specified size.

moseq2_pca.util.set_dask_config(memory={'pause': False, 'spill': False, 'target': 0.85, 'terminate': 0.95})

Set initial dask configuration parameters

Args: memory (dict): dictionary containing default dask configuration variables to ensure safe amount of resource usage.

Visualization Module

Visualization operations for plotting computed PCs, a Scree Plot, and the Changepoint PDF histogram.

moseq2_pca.viz.changepoint_dist(cps, headless=False)

Creates bar plot describing computed Changepoint Distribution.

Args: cps (numpy.ndarray): changepoints to graph headless (bool): bool flag to run in headless environment

Returns: plt (plt.figure): figure to save ax (plt.ax): figure axis variable

moseq2_pca.viz.display_components(components, cmap='gray', headless=False)

Plot computed Principal Components.

Args: components (numpy.ndarray): components to plot cmap (str): color map to use; default is ‘gray’. headless (bool): bool flag to run in headless environment

Returns: plt (plt.figure): figure to save ax (plt.ax): figure axis variable

moseq2_pca.viz.plot_pca_results(output_dict, save_file, output_dir)

Plot and save trained PCA results.

Args: output_dict (dict): Dict object containing PCA training results save_file (str): Path to save the plots to. output_dir (str): Directory containing logger

moseq2_pca.viz.scree_plot(explained_variance_ratio, headless=False)

Plot a scree plot describing principal components.

Args: explained_variance_ratio (numpy.array): explained variance ratio of each principal component headless (bool): bool flag to run in headless environment

Returns: plt (plt.figure): figure to save

Subpackages