Training Arguments

Learn about the available training arguments in the PyTorch Image Model.

There are over 100 arguments available in the training script of the PyTorch Image Model.

These parameters can be organized into the following categories:

  • Dataset
  • Model
  • Optimizer
  • Learning rate
  • Augmentation and regularization
  • Batch normalization
  • Model exponential moving average
  • Miscellaneous

Dataset

The training script accepts the following arguments that are related to the datasets:

  • data_dir: This is the path to datasets.
  • dataset: This is the dataset type. If it’s not specified, it defaults to ImageFolder/ImageTar.
  • train-split: This specifies whether to split the datasets into train segments.
  • val-split: This specifies whether to split the datasets into validation segments.
  • dataset-download: This allows us to download datasets for supported torch and TFDS datasets.
  • class-map: This is the class-to-idx mapping file path.

Model

We can specify the following arguments to configure our model:

  • model: This is the name of the model to train (default is resnet50).
  • pretrained: This specifies whether to start with a pretrained version of the specified network if available.
  • initial-checkpoint: We use this checkpoint to initialize the model.
  • resume: This specifies whether to resume the full model and optimizer state from a checkpoint.
  • no-resume-opt: This prevents the resumption of the optimizer state when resuming model.
  • num-classes: This is the total number of label classes.
  • gp: This is the type of global pool. It accepts fast, avg, max, avgmax, or avgmaxc.
  • img-size: This is the patch size of the image.
  • input-size: This is the dimensions of the input image (D H W). For example, we can use --input-size 3 224 224 for input of 224 x 224 RGB images.
  • crop-pct: This is the center crop percentage of the input image (used only for validation).
  • mean: This is the mean pixel value of datasets (it will override the default mean).
  • std: This is the standard deviation of datasets (it will override the default standard deviation).
  • interpolation: This is the type of the resize interpolation.
  • b: This is the training input batch size (default is 128).
  • vb: This is the validation input batch size (default is None).

Optimizers

The arguments for optimizer are as follows:

  • opt: This is the optimizer (default is sgd).
  • opt-eps: This is the epsilon of the optimizer.
  • opt-betas: This is the beta of the optimizer.
  • momentum: This is the momentum of the optimizer (default is 0.9).
  • weight-decay: This is the weight decay (default is 2e-5).
  • clip-grad: This is the clip gradient norm (default is None, which indicates that no clipping will occur).
  • clip-mode: This is the gradient clipping mode. It accepts norm, value or agc.

Learning rate

The following arguments are useful to configure the learning rate of our model:

  • sched: This is the learning rate scheduler (default is cosine).
  • lr: This is the learning rate (default is 0.05).
  • lr-noise: This is the learning rate noise on/off epochs percentages.
  • lr-noise-pct: This is the learning rate’s noise limit percentages (default is 0.67).
  • lr-noise-std: This is the learning rate’s noise standard deviation (default is 1.0).
  • lr-cycle-mul: This is the learning rate cycle length multiplier (default is 1.0).
  • lr-cycle-decay: This is the amount to decay each learning rate cycle by (default is 0.5).
  • lr-cycle-limit: This is the learning rate cycle limit. The default value is 1.
  • lr-k-decay: This is the learning rate k-decay for cosine and poly (default is 1.0).
  • warmup-lr: This is the learning rate warm up (default is 0.0001).
  • min-lr: This is the lower learning rate bound for cyclic schedulers that hit 0 (default is 1e-6).
  • epochs: This is the number of epochs to train (default is 300).
  • epoch-repeats: This is the epoch repeat multiplier (number of times to repeat datasets epoch per trained epoch).
  • start-epoch: This configures the epoch number manually. It’s useful on restarts.
  • decay-epochs: This is the epoch interval to decay the learning rate.
  • warmup-epochs: This is the number of epochs to warm up the learning rate (applicable only if the scheduler supports it).
  • cooldown-epochs: This is the number of epochs to cool down the learning rate at min_lr (after a cyclic schedule has ended).
  • patience-epochs: This is the patience epochs for the Plateau learning rate scheduler (default is 10).
  • decay-rate: This is the decay rate of the learning rate (default is 0.1).

Augmentation and regularization

The training script also accepts the following arguments:

  • no-aug: This specifies whether to disable all training augmentations.
  • scale: This is the random resize scale (default is from 0.08 to 1.0).
  • ratio: This is the random resize aspect ratio (default is [0.75, 1.33]).
  • hflip: This is the horizontal flip training augmentation probability (default is 0.5).
  • vflip: This is the vertical flip training augmentation probability.
  • color-jitter: This is the color jitter factor (default is 0.4).
  • aa: This enables the AutoAugment policy. It accepts v0 or original.
  • aug-repeats: This is the number of augmentation repetitions (default is 0). This is only for distributed training.
  • aug-splits: This is the number of augmentation splits (default is 0, and the input value must be 0 or greater or equal to 2).
  • jsd-loss: This enables the Jensen-Shannon Divergence and cross-entropy loss. We can use it with --aug-splits.
  • bce-loss: This enables the BCE loss. We can complement it with mixup or cutmix augmentations.
  • bce-target-thresh: This is the binarization threshold for softened BCE targets.
  • reprob: This is the Random Erase probability (default is 0).
  • remode: This is the Random Erase mode (pixel is the default value). recount: This is the Random Erase count (default is 1).
  • resplit: This specifies whether to erase the first (clean) augmentation split at random.
  • mixup: This is the Mixup alpha (default is 0, and Mixup will be enabled if greater than 0).
  • cutmix: This is the CutMix alpha (default is 0, and CutMix will be enabled if greater than 0).
  • cutmix-minmax: the CutMix minimum and maximum ratio (the default is None, and if set, it overrides alpha and enables CutMix).
  • mixup-prob: This is the probability of performing MixUp or CutMix augmentations when either or both are enabled.
  • mixup-switch-prob: This is the probability of switching toCutMix when both Mixup and CutMix are enabled.
  • mixup-mode: This is the Mixup or CutMix method. It accepts batch, pair, or elem.
  • mixup-off-epoch: This specifies whether Mixup should be disabled after N epochs (default is 0).
  • smoothing: This is the label smoothing (default is 0.1).
  • train-interpolation: This is the training interpolation mode. It accepts random (the default), bilinear, or bicubic.
  • drop: This is the dropout rate (default is 0).
  • drop-path: This is the drop path rate.
  • drop-block: This is the drop block rate.

Batch normalization

Currently, the following batch normalization arguments only work with gen_efficientnet based models:

  • bn-momentum: This is the batch normalization momentum.
  • bn-eps: This is the batch normalization epsilon.
  • sync-bn: This enables synchronized batch normalization with NVIDIA Apex or Torch.
  • dist-bn: This is the method to distribute batch normalization stats between nodes after each epoch. It accepts broadcast, reduce (the default), or an empty string.
  • split-bn: This enables separate batch normalization layers per augmentation split.

Model exponential moving average

The arguments for an exponential moving average are as follows:

  • model-ema: This specifies whether to track the moving average of model weights.
  • model-ema-force-cpu: This forcefully tracks the exponential moving average on the CPU (only for rank = 0 nodes).
  • model-ema-decay: This is the decay factor for model weights. It’s the moving average (default is 0.9998).

Miscellaneous

There are several arguments available under miscellaneous. The most useful arguments are as follows:

  • seed: This is the random seed (default is 42).
  • checkpoint-hist: This is the number of checkpoints to keep (default is 10).
  • amp: This specifies whether to use NVIDIA Apex AMP or Native AMP for mixed-precision training.
  • apex-amp: This uses NVIDIA Apex AMP mixed-precision.
  • native-amp: This uses Native Torch AMP mixed-precision.
  • output: This is the path to the output folder.
  • torchscript: This specifies whether to convert model torchscript for inference.

Get hands-on with 1200+ tech skills courses.