moseq2_model Package

CLI Module

moseq2-model

moseq2-model [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

Default:

False

apply-model

Apply pre-trained ARHMM to PC scores.

moseq2-model apply-model [OPTIONS] MODEL_FILE PC_FILE DEST_FILE

Options

--var-name <var_name>

Variable name in input file with PCs

Default:

scores

-i, --index <index>

Path to moseq2-index.yaml for group definitions

Default:

--load-groups <load_groups>

If groups should be loaded with the PC scores.

Default:

True

Arguments

MODEL_FILE

Required argument

PC_FILE

Required argument

DEST_FILE

Required argument

count-frames

Count the number of frames in given h5 file (pca_scores)

moseq2-model count-frames [OPTIONS] INPUT_FILE

Options

--var-name <var_name>

Variable name in input file with PCs

Default:

scores

Arguments

INPUT_FILE

Required argument

kappa-scan

Batch train multiple model to scan over different kappa values.

moseq2-model kappa-scan [OPTIONS] INPUT_FILE OUTPUT_DIR

Options

-i, --index <index>

Path to moseq2-index.yaml for session metadata and group information

Default:

--out-script <out_script>

Name of bash script file to save model training commands.

Default:

train_out.sh

--n-models <n_models>

Number of models to train in kappa scan.

Default:

10

--prefix <prefix>

Batch command string to prefix model training command (slurm only).

Default:

--cluster-type <cluster_type>

Platform to train models on

Default:

local

Options:

local | slurm

--scan-scale <scan_scale>

Scale to scan kappa values at.

Default:

log

Options:

log | linear

--min-kappa <min_kappa>

Minimum kappa value to begin scan from.

--max-kappa <max_kappa>

Maximum kappa value to end scan on.

--memory <memory>

RAM (slurm only)

Default:

5GB

--wall-time <wall_time>

Wall time (slurm only)

Default:

3:00:00

--partition <partition>

Partition name (slurm only)

Default:

short

--get-cmd

Print scan command strings.

Default:

False

--run-cmd

Run scan command strings.

Default:

False

--check-every <check_every>

Increment to record training and validation log-likelihoods.

Default:

5

--robust

Use robust AR-HMM model. More tolerant to noise

Default:

False

--separate-trans

Use separate transition matrix for each group

Default:

False

--nlags <nlags>

Number of lags to use

Default:

3

--noise-level <noise_level>

Additive white gaussian noise to input data for regularization. Not generally used

Default:

0

-a, --alpha <alpha>

Alpha; hierarchical dirichlet process hyperparameter (try not to change it).

Default:

5.7

-g, --gamma <gamma>

Gamma; hierarchical dirichlet process hyperparameter (try not to change it).

Default:

1000.0

--load-groups <load_groups>

If groups should be loaded with the PC scores.

Default:

True

--percent-split <percent_split>

Training-validation split percentage used when not holding out data and when this parameter > 0.

Default:

0

-p, --progressbar <progressbar>

Show model progress

Default:

True

-w, --whiten <whiten>

Whiten PCs: (e)each session (a)ll combined or (n)o whitening

Default:

all

--npcs <npcs>

Number of PCs to use

Default:

10

-m, --max-states <max_states>

Maximum number of states

Default:

100

--save-model <save_model>

Save model object at the end of training

Default:

True

-s, --save-every <save_every>

Increment to save labels and model object (-1 for just last)

Default:

-1

--e-step

Compute the expected state sequence for each recordings

Default:

False

--var-name <var_name>

Variable name in input file with PCs

Default:

scores

-n, --num-iter <num_iter>

Number of interations to resample model

Default:

100

-c, --ncpus <ncpus>

Number of cores to use for resampling

Default:

0

--nfolds <nfolds>

Number of folds for split

Default:

5

--hold-out-seed <hold_out_seed>

Random seed for holding out data (set for reproducibility)

Default:

-1

-h, --hold-out

Hold out one fold (set by nfolds) for computing heldout likelihood

Default:

False

Arguments

INPUT_FILE

Required argument

OUTPUT_DIR

Required argument

learn-model

Train ARHMM on PC Scores with given training parameters

moseq2-model learn-model [OPTIONS] INPUT_FILE DEST_FILE

Options

--check-every <check_every>

Increment to record training and validation log-likelihoods.

Default:

5

--robust

Use robust AR-HMM model. More tolerant to noise

Default:

False

--separate-trans

Use separate transition matrix for each group

Default:

False

--nlags <nlags>

Number of lags to use

Default:

3

--noise-level <noise_level>

Additive white gaussian noise to input data for regularization. Not generally used

Default:

0

-a, --alpha <alpha>

Alpha; hierarchical dirichlet process hyperparameter (try not to change it).

Default:

5.7

-g, --gamma <gamma>

Gamma; hierarchical dirichlet process hyperparameter (try not to change it).

Default:

1000.0

--load-groups <load_groups>

If groups should be loaded with the PC scores.

Default:

True

--percent-split <percent_split>

Training-validation split percentage used when not holding out data and when this parameter > 0.

Default:

0

-p, --progressbar <progressbar>

Show model progress

Default:

True

-w, --whiten <whiten>

Whiten PCs: (e)each session (a)ll combined or (n)o whitening

Default:

all

--npcs <npcs>

Number of PCs to use

Default:

10

-m, --max-states <max_states>

Maximum number of states

Default:

100

--save-model <save_model>

Save model object at the end of training

Default:

True

-s, --save-every <save_every>

Increment to save labels and model object (-1 for just last)

Default:

-1

--e-step

Compute the expected state sequence for each recordings

Default:

False

--var-name <var_name>

Variable name in input file with PCs

Default:

scores

-n, --num-iter <num_iter>

Number of interations to resample model

Default:

100

-c, --ncpus <ncpus>

Number of cores to use for resampling

Default:

0

--nfolds <nfolds>

Number of folds for split

Default:

5

--hold-out-seed <hold_out_seed>

Random seed for holding out data (set for reproducibility)

Default:

-1

-h, --hold-out

Hold out one fold (set by nfolds) for computing heldout likelihood

Default:

False

-k, --kappa <kappa>

Kappa; hyperparameter used to set syllable duration. Larger k = longer syllable lengths

--checkpoint-freq <checkpoint_freq>

save model checkpoint every n iterations

Default:

-1

--use-checkpoint

indicate whether to use previously saved checkpoint

Default:

False

-i, --index <index>

Path to moseq2-index.yaml for group definitions

Default:

--default-group <default_group>

Default group name to use for separate-trans

Default:

n/a

-v, --verbose

Print syllable log-likelihoods during training.

Default:

False

Arguments

INPUT_FILE

Required argument

DEST_FILE

Required argument

GUI Module

GUI front-end functions for training ARHMM.

moseq2_model.gui.apply_model_command(progress_paths, model_file)

Apply a pre-trained ARHMM to a new dataset from within a Jupyter notebook.

Args:

progress_paths (dict): notebook progress dict that contains paths to the pc scores, config, and index files. model_file (str): path to the pre-trained ARHMM.

moseq2_model.gui.learn_model_command(progress_paths, get_cmd=True, verbose=False)

Train ARHMM from within a Jupyter notebook using parameters specified in the notebook.

Args: progress_paths (dict): notebook progress dict that contains paths to the pc scores, config, and index files. get_cmd (bool): flag to return the kappa scan learn-model commands. verbose (bool): compute modeling summary - can slow down training.

Returns: None or kappa scan command

General Utilities Module

Utility functions for handling loading and saving models and their respective metadata.

moseq2_model.util.copy_model(model_obj)

Return a deep copy of the ARHMM that doesn’t contain the training data.

Args: model_obj (ARHMM): model to copy.

Returns: cp (ARHMM): copy of the model

moseq2_model.util.count_frames(data_dict=None, input_file=None, var_name='scores')

Count the total number of frames loaded from the PC scores file.

Args: data_dict (OrderedDict): Loaded PC scores OrderedDict object. input_file (str): Path to PC Scores file to load data_dict if not already data_dict is None var_name (str): Path within PCA h5 file to load scores from.

Returns: total_frames (int): total number of counted frames.

moseq2_model.util.create_command_strings(input_file, output_dir, config_data, kappas, model_name_format='model-{:03d}-{}.p')

Create the CLI learn-model command strings with parameter flags based on the contents of the configuration dict.

Args: input_file (str): Path to PC Scores output_dir (str): Path to directory to save models in. config_data (dict): Configuration parameters dict. kappas (list): List of kappa values for model training commands. model_name_format (str): Filename string format string.

Returns: command_string (str): CLI learn-model command strings with the requested parameters separated by newline characters

moseq2_model.util.dict_to_h5(h5file, export_dict, path='/')

Recursively save dicts to h5 file groups.

Args: h5file (h5py.File): opened h5py File object. export_dict (dict): dictionary to save path (str): path within h5 to save to.

Returns:

moseq2_model.util.get_current_model(use_checkpoint, all_checkpoints, train_data, model_parameters)

Load the latest model checkpoint of use_checkppoint parameter is True, otherwise instantiate a new model.

Args: use_checkpoint (bool): flag that indicates whether to load a checkpointed model all_checkpoints (list): list of all found checkpoint paths train_data (OrderedDict): dictionary of uuid-PC score key-value pairs model_parameters (dict): dictionary of required modeling hyperparameters.

Returns: arhmm (ARHMM): instantiated model object including loaded data itr (int): starting iteration number for the model to begin training from.

moseq2_model.util.get_loglikelihoods(arhmm, data, groups, separate_trans, normalize=True)

Compute the log-likelihoods of the training sessions.

Args: arhmm (ARHMM): the ARHMM model object. data (dict): dict object with UUID keys containing the PCS used for training. groups (list): list of assigned groups for all corresponding session uuids. separate_trans (bool): flag to compute separate log-likelihoods for each modeled group. normalize (bool): if set to True this function will normalize by frame counts in each session

Returns: ll (list): list of log-likelihoods for the trained model

moseq2_model.util.get_parameter_strings(config_data)

Create the CLI learn-model command using the given config_data dict contents to run the modeling step.

Args: config_data (dict): Configuration parameters dict.

Returns: parameters (str): String containing CLI command parameter flags. prefix (str): Prefix string for the learn-model command (Slurm only).

moseq2_model.util.get_parameters_from_model(model)

Get parameter dictionary from model.

Args: model (ARHMM): model to get parameters from.

Returns: parameters (dict): dictionary containing all modeling parameters

moseq2_model.util.get_scan_range_kappas(data_dict, config_data)

Get the kappa values to train models on based on the user’s selected scanning scale range. Default values will be selected if min/max_kappa are None.

An example: scan_scale = ‘log’; nframes = 1800; min_kappa = 10e3; max_kappa = 10e5; n_models = 10; >>> kappas = [1000, 1668, 2782, 4641, 7742, 12915, 21544, 35938, 59948, 100000]

Another Exmaple: nframes = 1800 ‘scan_scale’: ‘linear’, ‘min_kappa’: None, ‘max_kappa’: None, ‘n_models’: 10 min(kappas) == 18 max(kappas) == 18000000 >>> kappas == [18, 20016, 40014, 60012, 80010, 100008, 120006, 140004, 160002, 180000]

Args: data_dict (OrderedDict): Loaded PCA score dictionary. config_data (dict): Configuration parameters dict.

Returns: kappas (list): list of ints corresponding to the kappa value for each model.

moseq2_model.util.get_session_groupings(data_metadata, train_list, hold_out_list)

Create a list or tuple of assigned groups for training and (optionally) held out data.

Args: data_metadata (dict): dict containing session group information all_keys (list): list of all corresponding included session uuids hold_out_list (list): list of held-out uuids

Returns: groupings (tuple): 2-tuple containing lists of train groups and held-out groups (if held_out_list exists)

moseq2_model.util.h5_to_dict(h5file, path: str = '/') dict

Load h5 data to dictionary from a user specified path.

Args: h5file (str or h5py.File): file path to the given h5 file or the h5 file handle path (str): path to the base dataset within the h5 file

Returns: out (dict): a dict with h5 file contents with the same path structure

moseq2_model.util.is_uuid(string)

Check to see if string is a uuid.

Args: string (str): string containing a session uuid in the index file

Returns: (bool): boolean to indicate if a string is a uuid.

moseq2_model.util.load_arhmm_checkpoint(filename: str, train_data: dict) dict

Load an arhmm checkpoint and add data into the arhmm model checkpoint.

Args: filename (str): path that specifies the checkpoint. train_data (OrderedDict): an OrderedDict that contains the training data

Returns: mdl_dict (dict): a dict containing the model with reloaded data, and associated training data

moseq2_model.util.load_cell_string_from_matlab(filename, var_name='uuids')

Load cell strings from MATLAB file.

Args: filename (str): path to .mat file var_name (str): variable name to read

Returns: return_list (list): list of selected loaded variables

moseq2_model.util.load_data_from_matlab(filename, var_name='features', npcs=10)

Load PC Scores from a specified variable column in a MATLAB file.

Args: filename (str): path to MATLAB (.mat) file var_name (str): variable to load npcs (int): number of PCs to load.

Returns: data_dict (OrderedDict): loaded dictionary of uuid and PC-score pairings.

moseq2_model.util.load_dict(filename)

Load dictionary from file.

Args:

filename (str): path to file where dict is saved

Returns:

obj (dict): loaded dictionary

moseq2_model.util.load_pcs(filename, var_name='features', load_groups=False, npcs=10)

Load the Principal Component Scores for modeling.

Args: filename (str): path to the file that contains PC scores var_name (str): key for pc scores in the h5 file load_groups (bool): Load metadata group variable npcs (int): Number of PCs to load

Returns: data_dict (OrderedDict): key-value pairs for keys being uuids and values being PC scores. metadata (OrderedDict): dictionary containing lists of index-aligned uuids and groups.

moseq2_model.util.save_arhmm_checkpoint(filename: str, arhmm: dict)

Save an arhmm checkpoint.

Args: filename (str): path that specifies the checkpoint arhmm (dict): a dictionary containing the arhmm object, training iteration number, log-likelihoods of each training step, and labels for each step.

moseq2_model.util.save_dict(filename, obj_to_save=None)

Save dictionary to file.

Args: filename (str): path to file where dict is being saved. obj_to_save (dict): dict to save.

Subpackages