moseq2_pca package
CLI Module
moseq2-pca
moseq2-pca [OPTIONS] COMMAND [ARGS]...
Options
- --version
Show the version and exit.
- Default:
False
apply-pca
Compute PCA Scores of extraction data given a pre-trained PCA
moseq2-pca apply-pca [OPTIONS]
Options
- --chunk-size <chunk_size>
Number of frames per chunk
- Default:
4000
- --h5-mask-path <h5_mask_path>
Path to log-likelihood mask in h5 files
- Default:
/frames_mask
- --h5-path <h5_path>
Path to data in h5 files
- Default:
/frames
- --config-file <config_file>
Path to configuration file
- -o, --output-dir <output_dir>
Directory to store PCA results
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- -i, --input-dir <input_dir>
Directory to find extracted h5 files
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs
- --cluster-type <cluster_type>
Compute enviornment the command runs in
- Default:
local
- Options:
local | slurm | nodask
- --timeout <timeout>
Time to wait for workers to initialize before proceeding (minutes)
- Default:
5
- -w, --wall-time <wall_time>
Wall time (compute time) for workers
- Default:
06:00:00
- -m, --memory <memory>
Total RAM usage per worker
- Default:
15GB
- -p, --processes <processes>
Number of processes to run on each worker
- Default:
1
- -c, --cores <cores>
Number of cores per worker
- Default:
1
- -n, --nworkers <nworkers>
Number of workers
- Default:
1
- -q, --queue <queue>
Cluster queue/partition for submitting jobs
- Default:
debug
- --dask-port <dask_port>
Port to access dask dashboard
- Default:
8787
- -d, --dask-cache-path <dask_cache_path>
Path to spill data to disk for dask
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- --output-file <output_file>
Name of h5 file for storing pca results
- Default:
pca_scores
- --pca-path <pca_path>
Path to pca components in h5 file
- Default:
/components
- --pca-file <pca_file>
Path to PCA results
- --fill-gaps <fill_gaps>
Fill dropped frames with nans
- Default:
True
- --fps <fps>
Frames per second (frame rate)
- Default:
30
- --detrend-window <detrend_window>
Length of detrend window (in seconds, 0 for no detrending)
- Default:
0
- -v, --verbose
Print sessions as they are being loaded.
- Default:
False
- --overwrite-pca-apply <overwrite_pca_apply>
Used to bypass the pca overwrite question. If True: skip question, run automatically
- Default:
False
clip-scores
Clip specified number of frames from PCA scores at the beginning or end
moseq2-pca clip-scores [OPTIONS] PCA_FILE CLIP_SAMPLES
Options
- --from-end
if true clip from end rather than beginning
- Default:
False
Arguments
- PCA_FILE
Required argument
- CLIP_SAMPLES
Required argument
compute-changepoints
Compute the Model-Free Syllable Changepoints based on the PCA/PCA_Scores
moseq2-pca compute-changepoints [OPTIONS]
Options
- --chunk-size <chunk_size>
Number of frames per chunk
- Default:
4000
- --h5-mask-path <h5_mask_path>
Path to log-likelihood mask in h5 files
- Default:
/frames_mask
- --h5-path <h5_path>
Path to data in h5 files
- Default:
/frames
- --config-file <config_file>
Path to configuration file
- -o, --output-dir <output_dir>
Directory to store PCA results
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- -i, --input-dir <input_dir>
Directory to find extracted h5 files
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs
- --cluster-type <cluster_type>
Compute enviornment the command runs in
- Default:
local
- Options:
local | slurm | nodask
- --timeout <timeout>
Time to wait for workers to initialize before proceeding (minutes)
- Default:
5
- -w, --wall-time <wall_time>
Wall time (compute time) for workers
- Default:
06:00:00
- -m, --memory <memory>
Total RAM usage per worker
- Default:
15GB
- -p, --processes <processes>
Number of processes to run on each worker
- Default:
1
- -c, --cores <cores>
Number of cores per worker
- Default:
1
- -n, --nworkers <nworkers>
Number of workers
- Default:
1
- -q, --queue <queue>
Cluster queue/partition for submitting jobs
- Default:
debug
- --dask-port <dask_port>
Port to access dask dashboard
- Default:
8787
- -d, --dask-cache-path <dask_cache_path>
Path to spill data to disk for dask
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- --output-file <output_file>
Name of h5 file for storing pca results
- Default:
changepoints
- --pca-file-components <pca_file_components>
Path to PCA components
- --pca-file-scores <pca_file_scores>
Path to PCA results
- --pca-path <pca_path>
Path to pca components
- Default:
/components
- --neighbors <neighbors>
Neighbors to use for peak identification
- Default:
1
- --threshold <threshold>
Peak threshold to use for changepoints
- Default:
0.5
- -k, --klags <klags>
Lag to use for derivative calculation
- Default:
6
- -s, --sigma <sigma>
Standard deviation of gaussian smoothing filter
- Default:
3.5
- -d, --dims <dims>
Number of random projections to use
- Default:
300
- --fps <fps>
Frames per second (frame rate)
- Default:
30
- -v, --verbose
Print sessions as they are being loaded.
- Default:
False
train-pca
Train PCA on all extracted results (h5 files) in input directory
moseq2-pca train-pca [OPTIONS]
Options
- --chunk-size <chunk_size>
Number of frames per chunk
- Default:
4000
- --h5-mask-path <h5_mask_path>
Path to log-likelihood mask in h5 files
- Default:
/frames_mask
- --h5-path <h5_path>
Path to data in h5 files
- Default:
/frames
- --config-file <config_file>
Path to configuration file
- -o, --output-dir <output_dir>
Directory to store PCA results
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- -i, --input-dir <input_dir>
Directory to find extracted h5 files
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs
- --cluster-type <cluster_type>
Compute enviornment the command runs in
- Default:
local
- Options:
local | slurm | nodask
- --timeout <timeout>
Time to wait for workers to initialize before proceeding (minutes)
- Default:
5
- -w, --wall-time <wall_time>
Wall time (compute time) for workers
- Default:
06:00:00
- -m, --memory <memory>
Total RAM usage per worker
- Default:
15GB
- -p, --processes <processes>
Number of processes to run on each worker
- Default:
1
- -c, --cores <cores>
Number of cores per worker
- Default:
1
- -n, --nworkers <nworkers>
Number of workers
- Default:
1
- -q, --queue <queue>
Cluster queue/partition for submitting jobs
- Default:
debug
- --dask-port <dask_port>
Port to access dask dashboard
- Default:
8787
- -d, --dask-cache-path <dask_cache_path>
Path to spill data to disk for dask
- Default:
/home/wingillis/dev/moseq/moseq2-pca/docs/_pca
- --gaussfilter-space <gaussfilter_space>
x, y sigma for kernel in Spatial filter for data (Gaussian)
- Default:
1.5, 1
- --gaussfilter-time <gaussfilter_time>
sigma for temporal filter for data (Gaussian)
- Default:
0
- --medfilter-space <medfilter_space>
kernel size for median spatial filter
- Default:
0
- --medfilter-time <medfilter_time>
kernel size for median temporal filter
- Default:
0
- --missing-data
Use missing data PCA; will be automatically set to True if cable-filter-iters > 1 from the extract step.
- Default:
False
- --missing-data-iters <missing_data_iters>
number of missing data PCA iterations
- Default:
10
- --mask-threshold <mask_threshold>
Threshold for mask (missing data PCA only)
- Default:
-16
- --mask-height-threshold <mask_height_threshold>
Threshold for mask based on floor height
- Default:
5
- --min-height <min_height>
Min mouse height from floor (mm)
- Default:
10
- --max-height <max_height>
Max mouse height from floor (mm)
- Default:
120
- --tailfilter-size <tailfilter_size>
Tail filter size
- Default:
9, 9
- --tailfilter-shape <tailfilter_shape>
Tail filter shape
- Default:
ellipse
- --use-fft
Use 2D fft
- Default:
False
- --train-on-subset <train_on_subset>
The fraction of the total frames the PCA is trained on; default PCA is trained on all frames
- Default:
1
- --recon-pcs <recon_pcs>
Number of PCs to use for missing data reconstruction
- Default:
10
- --rank <rank>
Rank for compressed SVD
- Default:
25
- --output-file <output_file>
Name of h5 file for storing pca results
- Default:
pca
- --local-processes <local_processes>
Used with a local cluster. If True: use processes, If False: use threads
- Default:
False
- --overwrite-pca-train <overwrite_pca_train>
Used to bypass the pca overwrite question. If True: skip question, run automatically
- Default:
False
- --camera-type <camera_type>
specify the camera type (k2 or azure), default is k2
- Default:
k2
GUI Module
GUI front-end operations for PCA.
- moseq2_pca.gui.apply_pca_command(progress_paths, output_file)
Compute PCA Scores given trained PCA using Jupyter Notebook.
Args: progress_paths (dict): dictionary containing notebook progress paths output_file (str): name of output pca file.
Returns: (str): success string.
- moseq2_pca.gui.compute_changepoints_command(input_dir, progress_paths, output_file)
Compute Changepoint distribution using Jupyter Notebook.
Args: input_dir (str): path to directory containing training data progress_paths (dict): dictionary containing notebook progress paths output_file (str): name of output pca file.
Returns: (str): success string.
- moseq2_pca.gui.train_pca_command(progress_paths, output_dir, output_file)
Train PCA through Jupyter notebook, and updates config file.
Args: progress_paths (dict): dictionary containing notebook progress paths output_dir (str): path to output pca directory output_file (str): name of output pca file.
Returns:
Utilities Module
Utility and helper functions for finding and reading files, filtering operations, Dask initialization, and changepoint helper functions.
- moseq2_pca.util.check_timestamps(h5s)
Helper function to determine whether timestamps and/or metadata is missing from extracted files. Function will emit a warning if either pieces of data are missing.
Args: h5s (list): List of paths to all extracted h5 files.
- moseq2_pca.util.clean_frames(frames, medfilter_space=None, gaussfilter_space=None, medfilter_time=None, gaussfilter_time=None, detrend_time=None, tailfilter=None, tail_threshold=5)
Filter spatial/temporal noise from frames using Median and Gaussian filters, given kernel sizes for each respective requested filter.
Args: frames (numpy.ndarray): frames to filter. medfilter_space (list): median spatial filter kernel. gaussfilter_space (list): gaussian spatial filter kernel. medfilter_time (list): median temporal filter. gaussfilter_time (list): gaussian temporal filter. detrend_time (int): number of frames to lag for. tailfilter (int): size of tail-filter kernel. tail_threshold (int): threshold value to use for tail filtering
Returns: out (numpy.ndarray): filtered frames.
- moseq2_pca.util.close_dask(client, cluster, timeout)
Shut down the Dask client and cluster, and dump all cache data.
Args: client (Dask Client): Client object cluster (dask Cluster): initialized Cluster timeout (int): Time to wait for client to close gracefully (minutes)
Returns:
- moseq2_pca.util.combine_new_config(config_file, config_data)
Read config file and combine new config params with it
- Args:
config_file (str): path to config.yaml config_data (dict): dictionary of config data
- moseq2_pca.util.command_with_config(config_file_param_name)
Helper function to assign variables from a config file. Hierachy of CLI prameters: params from cli options > params from config_file > default params
Args: config_file_param_name (str): parameter name to update with config file variable.
Returns: custom_command_class (click.Command): updated Click Command containing parameters from inputted config file.
- moseq2_pca.util.gauss_smooth(signal, win_length=None, sig=1.5, kernel=None)
Perform Gaussian Smoothing on a 1D signal.
Args: signal (1d numpy array): signal to perform smoothing win_length (int): window size for gaussian kernel filter sig (float): variance of 1d gaussian kernel. kernel (tuple): kernel size to use for smoothing
Returns: result (1d numpy array): smoothed signal
- moseq2_pca.util.gaussian_kernel1d(n=None, sig=3)
Get 1D gaussian kernel.
Args: n (int): window size. sig (int): variance of kernel to use.
Returns: kernel (1d array): 1D numpy kernel.
- moseq2_pca.util.get_changepoints(scores, k=5, sigma=3, peak_height=0.5, peak_neighbors=1, baseline=True, timestamps=None)
Compute changepoints and its corresponding distribution. Changepoints describe the magnitude of frame-to-frame changes of mouse pose.
Args: scores (numpy.ndarray): nframes * rows * columns k (int): klags - Lag to use for derivative calculation. sigma (int): Standard deviation of gaussian smoothing filter. peak_height (float): user-defined peak Changepoint length. peak_neighbors (int): number of peaks in the CP curve. baseline (bool): normalize data. timestamps (numpy.array): loaded timestamps.
Returns: cps (numpy.ndarray): array of changepoint values normed_df (numpy.array): array of values for bar plot
- moseq2_pca.util.get_env_cpu_and_mem()
Read current system environment and return the amount of available memory and CPUs to allocate to the created cluster.
Returns: mem (float): Optimal number of memory (in bytes) to allocate to initialized dask cluster cpu (int): Optimal number of CPUs to allocate to dask
- moseq2_pca.util.get_metadata_path(h5file)
Return path within h5 file that contains the kinect extraction metadata.
Args: h5file (str): path to h5 file.
Returns: (str): path to acquistion metadata within h5 file.
- moseq2_pca.util.get_rps(frames, rps=600, normalize=True)
Get random projections of frames.
Args: frames (numpy.array): Frames to get dimensions from. rps (int): Number of random projections. normalize (bool): indicates whether to normalize the random projections.
Returns: rproj (2D or 3D numpy array): Computed random projections with same shape as frames
- moseq2_pca.util.get_timestamp_path(h5file)
Return path within h5 file that contains the kinect timestamps
Args: h5file (str): path to h5 file.
Returns: (str): path to metadata timestamps within h5 file
- moseq2_pca.util.h5_to_dict(h5file, path)
Read all contents from h5 and returns them in a nested dict object.
Args: h5file (str): path to h5 file path (str): path to group within h5 file
Returns: ans (dict): dictionary of all h5 group contents
- moseq2_pca.util.initialize_dask(nworkers=50, processes=1, memory='4GB', cores=1, wall_time='01:00:00', queue='debug', local_processes=False, cluster_type='local', timeout=10, cache_path='/home/wingillis/moseq2_pca', dashboard_port='8787', data_size=None, **kwargs)
Initialize dask client, cluster, workers, etc.
Args: nworkers (int): number of dask workers to initialize processes (int): number of processes per worker memory (str): amount of memory to allocate to dask cluster cores (int): number of cores to use. wall_time (str): amount of time to allow program to run queue (str): logging mode local_processes (bool): flag to use processes or threads when using a local cluster cluster_type (str): indicate what cluster to use (local or slurm) timeout (int): how many minutes to wait for workers to initialize cache_path (str or Pathlike): path to store cached data dashboard_port (str): port number to find dask statistics data_size (float): size of the dask array in number of bytes. kwargs: extra keyward arguments
Returns: client (dask Client): initialized Client cluster (dask Cluster): initialized Cluster workers (dask Workers): intialized workers
- moseq2_pca.util.insert_nans(timestamps, data, fps=30)
Fill NaN values with 0 in given 1D timestamps array. Used to handle dropped frames from the video acquisition.
Args: timestamps (numpy.array): timestamp values data (numpy.array): additional data to fill with NaN values - can be PC scores fps (int): frames per second
Returns: filled_data (numpy.array): filled missing timestamp values. data_idx (numpy.array): indices of inserted 0s filled_timestamps (numpy.array): filled timestamp-strs
- moseq2_pca.util.read_yaml(yaml_file)
Read yaml file and return dictionary representation of file contents.
Args: yaml_file (str): path to yaml file
Returns: return_dict (dict): dict of yaml file contents
- moseq2_pca.util.recursive_find_h5s(root_dir='/home/wingillis/dev/moseq/moseq2-pca/docs', ext='.h5', yaml_string='{}.yaml')
Recursively find h5 files, along with yaml files with the same basename
Args: root_dir (str): path to base directory to begin recursive search in. ext (str): extension to search for yaml_string (str): string for filename formatting when saving data
Returns: h5s (list): list of found h5 files dicts (list): list of found metadata files yamls (list): list of found yaml files
- moseq2_pca.util.select_strel(string='e', size=(10, 10))
Select Structuring Element Shape. Accepts shapes (‘ellipse’, ‘rectangle’), if neither are given then ‘ellipse’ is used.
Args: string (str): e for Ellipse, r for Rectangle size (tuple): size of StructuringElement
Returns: strel (cv2.StructuringElement): StructuringElement with specified size.
- moseq2_pca.util.set_dask_config(memory={'pause': False, 'spill': False, 'target': 0.85, 'terminate': 0.95})
Set initial dask configuration parameters
Args: memory (dict): dictionary containing default dask configuration variables to ensure safe amount of resource usage.
Visualization Module
Visualization operations for plotting computed PCs, a Scree Plot, and the Changepoint PDF histogram.
- moseq2_pca.viz.changepoint_dist(cps, headless=False)
Creates bar plot describing computed Changepoint Distribution.
Args: cps (numpy.ndarray): changepoints to graph headless (bool): bool flag to run in headless environment
Returns: plt (plt.figure): figure to save ax (plt.ax): figure axis variable
- moseq2_pca.viz.display_components(components, cmap='gray', headless=False)
Plot computed Principal Components.
Args: components (numpy.ndarray): components to plot cmap (str): color map to use; default is ‘gray’. headless (bool): bool flag to run in headless environment
Returns: plt (plt.figure): figure to save ax (plt.ax): figure axis variable
- moseq2_pca.viz.plot_pca_results(output_dict, save_file, output_dir)
Plot and save trained PCA results.
Args: output_dict (dict): Dict object containing PCA training results save_file (str): Path to save the plots to. output_dir (str): Directory containing logger
- moseq2_pca.viz.scree_plot(explained_variance_ratio, headless=False)
Plot a scree plot describing principal components.
Args: explained_variance_ratio (numpy.array): explained variance ratio of each principal component headless (bool): bool flag to run in headless environment
Returns: plt (plt.figure): figure to save