moseq2_model Package
CLI Module
moseq2-model
moseq2-model [OPTIONS] COMMAND [ARGS]...
Options
- --version
Show the version and exit.
- Default:
False
apply-model
Apply pre-trained ARHMM to PC scores.
moseq2-model apply-model [OPTIONS] MODEL_FILE PC_FILE DEST_FILE
Options
- --var-name <var_name>
Variable name in input file with PCs
- Default:
scores
- -i, --index <index>
Path to moseq2-index.yaml for group definitions
- Default:
- --load-groups <load_groups>
If groups should be loaded with the PC scores.
- Default:
True
Arguments
- MODEL_FILE
Required argument
- PC_FILE
Required argument
- DEST_FILE
Required argument
count-frames
Count the number of frames in given h5 file (pca_scores)
moseq2-model count-frames [OPTIONS] INPUT_FILE
Options
- --var-name <var_name>
Variable name in input file with PCs
- Default:
scores
Arguments
- INPUT_FILE
Required argument
kappa-scan
Batch train multiple model to scan over different kappa values.
moseq2-model kappa-scan [OPTIONS] INPUT_FILE OUTPUT_DIR
Options
- -i, --index <index>
Path to moseq2-index.yaml for session metadata and group information
- Default:
- --out-script <out_script>
Name of bash script file to save model training commands.
- Default:
train_out.sh
- --n-models <n_models>
Number of models to train in kappa scan.
- Default:
10
- --prefix <prefix>
Batch command string to prefix model training command (slurm only).
- Default:
- --cluster-type <cluster_type>
Platform to train models on
- Default:
local
- Options:
local | slurm
- --scan-scale <scan_scale>
Scale to scan kappa values at.
- Default:
log
- Options:
log | linear
- --min-kappa <min_kappa>
Minimum kappa value to begin scan from.
- --max-kappa <max_kappa>
Maximum kappa value to end scan on.
- --memory <memory>
RAM (slurm only)
- Default:
5GB
- --wall-time <wall_time>
Wall time (slurm only)
- Default:
3:00:00
- --partition <partition>
Partition name (slurm only)
- Default:
short
- --get-cmd
Print scan command strings.
- Default:
False
- --run-cmd
Run scan command strings.
- Default:
False
- --check-every <check_every>
Increment to record training and validation log-likelihoods.
- Default:
5
- --robust
Use robust AR-HMM model. More tolerant to noise
- Default:
False
- --separate-trans
Use separate transition matrix for each group
- Default:
False
- --nlags <nlags>
Number of lags to use
- Default:
3
- --noise-level <noise_level>
Additive white gaussian noise to input data for regularization. Not generally used
- Default:
0
- -a, --alpha <alpha>
Alpha; hierarchical dirichlet process hyperparameter (try not to change it).
- Default:
5.7
- -g, --gamma <gamma>
Gamma; hierarchical dirichlet process hyperparameter (try not to change it).
- Default:
1000.0
- --load-groups <load_groups>
If groups should be loaded with the PC scores.
- Default:
True
- --percent-split <percent_split>
Training-validation split percentage used when not holding out data and when this parameter > 0.
- Default:
0
- -p, --progressbar <progressbar>
Show model progress
- Default:
True
- -w, --whiten <whiten>
Whiten PCs: (e)each session (a)ll combined or (n)o whitening
- Default:
all
- --npcs <npcs>
Number of PCs to use
- Default:
10
- -m, --max-states <max_states>
Maximum number of states
- Default:
100
- --save-model <save_model>
Save model object at the end of training
- Default:
True
- -s, --save-every <save_every>
Increment to save labels and model object (-1 for just last)
- Default:
-1
- --e-step
Compute the expected state sequence for each recordings
- Default:
False
- --var-name <var_name>
Variable name in input file with PCs
- Default:
scores
- -n, --num-iter <num_iter>
Number of interations to resample model
- Default:
100
- -c, --ncpus <ncpus>
Number of cores to use for resampling
- Default:
0
- --nfolds <nfolds>
Number of folds for split
- Default:
5
- --hold-out-seed <hold_out_seed>
Random seed for holding out data (set for reproducibility)
- Default:
-1
- -h, --hold-out
Hold out one fold (set by nfolds) for computing heldout likelihood
- Default:
False
Arguments
- INPUT_FILE
Required argument
- OUTPUT_DIR
Required argument
learn-model
Train ARHMM on PC Scores with given training parameters
moseq2-model learn-model [OPTIONS] INPUT_FILE DEST_FILE
Options
- --check-every <check_every>
Increment to record training and validation log-likelihoods.
- Default:
5
- --robust
Use robust AR-HMM model. More tolerant to noise
- Default:
False
- --separate-trans
Use separate transition matrix for each group
- Default:
False
- --nlags <nlags>
Number of lags to use
- Default:
3
- --noise-level <noise_level>
Additive white gaussian noise to input data for regularization. Not generally used
- Default:
0
- -a, --alpha <alpha>
Alpha; hierarchical dirichlet process hyperparameter (try not to change it).
- Default:
5.7
- -g, --gamma <gamma>
Gamma; hierarchical dirichlet process hyperparameter (try not to change it).
- Default:
1000.0
- --load-groups <load_groups>
If groups should be loaded with the PC scores.
- Default:
True
- --percent-split <percent_split>
Training-validation split percentage used when not holding out data and when this parameter > 0.
- Default:
0
- -p, --progressbar <progressbar>
Show model progress
- Default:
True
- -w, --whiten <whiten>
Whiten PCs: (e)each session (a)ll combined or (n)o whitening
- Default:
all
- --npcs <npcs>
Number of PCs to use
- Default:
10
- -m, --max-states <max_states>
Maximum number of states
- Default:
100
- --save-model <save_model>
Save model object at the end of training
- Default:
True
- -s, --save-every <save_every>
Increment to save labels and model object (-1 for just last)
- Default:
-1
- --e-step
Compute the expected state sequence for each recordings
- Default:
False
- --var-name <var_name>
Variable name in input file with PCs
- Default:
scores
- -n, --num-iter <num_iter>
Number of interations to resample model
- Default:
100
- -c, --ncpus <ncpus>
Number of cores to use for resampling
- Default:
0
- --nfolds <nfolds>
Number of folds for split
- Default:
5
- --hold-out-seed <hold_out_seed>
Random seed for holding out data (set for reproducibility)
- Default:
-1
- -h, --hold-out
Hold out one fold (set by nfolds) for computing heldout likelihood
- Default:
False
- -k, --kappa <kappa>
Kappa; hyperparameter used to set syllable duration. Larger k = longer syllable lengths
- --checkpoint-freq <checkpoint_freq>
save model checkpoint every n iterations
- Default:
-1
- --use-checkpoint
indicate whether to use previously saved checkpoint
- Default:
False
- -i, --index <index>
Path to moseq2-index.yaml for group definitions
- Default:
- --default-group <default_group>
Default group name to use for separate-trans
- Default:
n/a
- -v, --verbose
Print syllable log-likelihoods during training.
- Default:
False
Arguments
- INPUT_FILE
Required argument
- DEST_FILE
Required argument
GUI Module
GUI front-end functions for training ARHMM.
- moseq2_model.gui.apply_model_command(progress_paths, model_file)
Apply a pre-trained ARHMM to a new dataset from within a Jupyter notebook.
- Args:
progress_paths (dict): notebook progress dict that contains paths to the pc scores, config, and index files. model_file (str): path to the pre-trained ARHMM.
- moseq2_model.gui.learn_model_command(progress_paths, get_cmd=True, verbose=False)
Train ARHMM from within a Jupyter notebook using parameters specified in the notebook.
Args: progress_paths (dict): notebook progress dict that contains paths to the pc scores, config, and index files. get_cmd (bool): flag to return the kappa scan learn-model commands. verbose (bool): compute modeling summary - can slow down training.
Returns: None or kappa scan command
General Utilities Module
Utility functions for handling loading and saving models and their respective metadata.
- moseq2_model.util.copy_model(model_obj)
Return a deep copy of the ARHMM that doesn’t contain the training data.
Args: model_obj (ARHMM): model to copy.
Returns: cp (ARHMM): copy of the model
- moseq2_model.util.count_frames(data_dict=None, input_file=None, var_name='scores')
Count the total number of frames loaded from the PC scores file.
Args: data_dict (OrderedDict): Loaded PC scores OrderedDict object. input_file (str): Path to PC Scores file to load data_dict if not already data_dict is None var_name (str): Path within PCA h5 file to load scores from.
Returns: total_frames (int): total number of counted frames.
- moseq2_model.util.create_command_strings(input_file, output_dir, config_data, kappas, model_name_format='model-{:03d}-{}.p')
Create the CLI learn-model command strings with parameter flags based on the contents of the configuration dict.
Args: input_file (str): Path to PC Scores output_dir (str): Path to directory to save models in. config_data (dict): Configuration parameters dict. kappas (list): List of kappa values for model training commands. model_name_format (str): Filename string format string.
Returns: command_string (str): CLI learn-model command strings with the requested parameters separated by newline characters
- moseq2_model.util.dict_to_h5(h5file, export_dict, path='/')
Recursively save dicts to h5 file groups.
Args: h5file (h5py.File): opened h5py File object. export_dict (dict): dictionary to save path (str): path within h5 to save to.
Returns:
- moseq2_model.util.get_current_model(use_checkpoint, all_checkpoints, train_data, model_parameters)
Load the latest model checkpoint of use_checkppoint parameter is True, otherwise instantiate a new model.
Args: use_checkpoint (bool): flag that indicates whether to load a checkpointed model all_checkpoints (list): list of all found checkpoint paths train_data (OrderedDict): dictionary of uuid-PC score key-value pairs model_parameters (dict): dictionary of required modeling hyperparameters.
Returns: arhmm (ARHMM): instantiated model object including loaded data itr (int): starting iteration number for the model to begin training from.
- moseq2_model.util.get_loglikelihoods(arhmm, data, groups, separate_trans, normalize=True)
Compute the log-likelihoods of the training sessions.
Args: arhmm (ARHMM): the ARHMM model object. data (dict): dict object with UUID keys containing the PCS used for training. groups (list): list of assigned groups for all corresponding session uuids. separate_trans (bool): flag to compute separate log-likelihoods for each modeled group. normalize (bool): if set to True this function will normalize by frame counts in each session
Returns: ll (list): list of log-likelihoods for the trained model
- moseq2_model.util.get_parameter_strings(config_data)
Create the CLI learn-model command using the given config_data dict contents to run the modeling step.
Args: config_data (dict): Configuration parameters dict.
Returns: parameters (str): String containing CLI command parameter flags. prefix (str): Prefix string for the learn-model command (Slurm only).
- moseq2_model.util.get_parameters_from_model(model)
Get parameter dictionary from model.
Args: model (ARHMM): model to get parameters from.
Returns: parameters (dict): dictionary containing all modeling parameters
- moseq2_model.util.get_scan_range_kappas(data_dict, config_data)
Get the kappa values to train models on based on the user’s selected scanning scale range. Default values will be selected if min/max_kappa are None.
An example: scan_scale = ‘log’; nframes = 1800; min_kappa = 10e3; max_kappa = 10e5; n_models = 10; >>> kappas = [1000, 1668, 2782, 4641, 7742, 12915, 21544, 35938, 59948, 100000]
Another Exmaple: nframes = 1800 ‘scan_scale’: ‘linear’, ‘min_kappa’: None, ‘max_kappa’: None, ‘n_models’: 10 min(kappas) == 18 max(kappas) == 18000000 >>> kappas == [18, 20016, 40014, 60012, 80010, 100008, 120006, 140004, 160002, 180000]
Args: data_dict (OrderedDict): Loaded PCA score dictionary. config_data (dict): Configuration parameters dict.
Returns: kappas (list): list of ints corresponding to the kappa value for each model.
- moseq2_model.util.get_session_groupings(data_metadata, train_list, hold_out_list)
Create a list or tuple of assigned groups for training and (optionally) held out data.
Args: data_metadata (dict): dict containing session group information all_keys (list): list of all corresponding included session uuids hold_out_list (list): list of held-out uuids
Returns: groupings (tuple): 2-tuple containing lists of train groups and held-out groups (if held_out_list exists)
- moseq2_model.util.h5_to_dict(h5file, path: str = '/') dict
Load h5 data to dictionary from a user specified path.
Args: h5file (str or h5py.File): file path to the given h5 file or the h5 file handle path (str): path to the base dataset within the h5 file
Returns: out (dict): a dict with h5 file contents with the same path structure
- moseq2_model.util.is_uuid(string)
Check to see if string is a uuid.
Args: string (str): string containing a session uuid in the index file
Returns: (bool): boolean to indicate if a string is a uuid.
- moseq2_model.util.load_arhmm_checkpoint(filename: str, train_data: dict) dict
Load an arhmm checkpoint and add data into the arhmm model checkpoint.
Args: filename (str): path that specifies the checkpoint. train_data (OrderedDict): an OrderedDict that contains the training data
Returns: mdl_dict (dict): a dict containing the model with reloaded data, and associated training data
- moseq2_model.util.load_cell_string_from_matlab(filename, var_name='uuids')
Load cell strings from MATLAB file.
Args: filename (str): path to .mat file var_name (str): variable name to read
Returns: return_list (list): list of selected loaded variables
- moseq2_model.util.load_data_from_matlab(filename, var_name='features', npcs=10)
Load PC Scores from a specified variable column in a MATLAB file.
Args: filename (str): path to MATLAB (.mat) file var_name (str): variable to load npcs (int): number of PCs to load.
Returns: data_dict (OrderedDict): loaded dictionary of uuid and PC-score pairings.
- moseq2_model.util.load_dict(filename)
Load dictionary from file.
- Args:
filename (str): path to file where dict is saved
- Returns:
obj (dict): loaded dictionary
- moseq2_model.util.load_pcs(filename, var_name='features', load_groups=False, npcs=10)
Load the Principal Component Scores for modeling.
Args: filename (str): path to the file that contains PC scores var_name (str): key for pc scores in the h5 file load_groups (bool): Load metadata group variable npcs (int): Number of PCs to load
Returns: data_dict (OrderedDict): key-value pairs for keys being uuids and values being PC scores. metadata (OrderedDict): dictionary containing lists of index-aligned uuids and groups.
- moseq2_model.util.save_arhmm_checkpoint(filename: str, arhmm: dict)
Save an arhmm checkpoint.
Args: filename (str): path that specifies the checkpoint arhmm (dict): a dictionary containing the arhmm object, training iteration number, log-likelihoods of each training step, and labels for each step.
- moseq2_model.util.save_dict(filename, obj_to_save=None)
Save dictionary to file.
Args: filename (str): path to file where dict is being saved. obj_to_save (dict): dict to save.