Skip to content

Sweep config (hyperherd.yaml)

The configuration file is YAML. Every field is documented below with its type, whether it's required, and its default. The config is validated with Pydantic at parse time — invalid configs produce clear error messages immediately.

Top-level fields

Field Type Required Default Description
name string yes Unique experiment name. Used in the SLURM job name (hyperherd_<name>). Must be a valid filename component.
grid string or list no omitted Controls which parameters are swept via Cartesian product. See grid.
launcher string yes Path to the launcher bash script. Relative paths are resolved relative to the config file's directory. See Launcher Script.
slurm object no see below SLURM resource requests. See slurm.
hydra object no see below Static Hydra overrides. See hydra.
discord object no unset Discord channel for the herd monitor daemon. See discord.
parameters object yes Hyperparameter definitions. At least one parameter is required. See parameters.
conditions list no [] Conditional rules that filter or modify parameter combinations. See Conditions.

The workspace is the directory containing hyperherd.yaml. HyperHerd stores its state in a .hyperherd/ subdirectory within the workspace.

grid

Controls how parameter combinations are generated. Three forms:

Value Meaning
grid: all Full Cartesian product of all parameter values. No defaults required.
grid: [lr, weight_decay] Cartesian product of the listed parameters only. All other parameters are held at their default.
(omitted) One-at-a-time: start from defaults, vary each parameter independently. All parameters must have a default.

Full grid (grid: all) generates every combination. With 3 parameters of 4, 3, and 2 levels, you get 4 × 3 × 2 = 24 trials.

Partial grid (grid: [lr, wd]) generates a Cartesian product of only the listed parameters while holding everything else at its default. Use it to explore interactions between specific parameters without exploding the trial count.

One-at-a-time (no grid field) starts from a default combination, then varies each parameter independently. This produces 1 + Σ(levels_i − 1) trials. For the 4/3/2 example: 1 + 3 + 2 + 1 = 7 trials.

# Full grid:
grid: all

# Partial grid:
grid: [learning_rate, weight_decay]

# One-at-a-time (omit grid entirely):
# (no grid field)

When grid is not "all", every parameter that is not in the grid list must have a default. When grid is omitted, all parameters must have a default.

slurm

SLURM resource requests. These map directly to #SBATCH directives in the generated batch script.

Field Type Required Default Description
partition string no "default" SLURM partition name.
time string no "01:00:00" Wall-clock time limit in HH:MM:SS format.
mem string no "8G" Memory per node (e.g. "16G", "512M").
cpus_per_task integer no 1 Number of CPU cores per task.
gres string no omitted Generic resources (e.g. "gpu:1", "gpu:a100:2"). If omitted, the --gres line is not included.
max_concurrent integer no omitted Cap on simultaneously running array tasks. Appended as %N to the SLURM array spec (e.g. --array=0-49%5). Override per-run with herd run --max-concurrent N.
extra_args list[string] no [] Additional raw #SBATCH flags. Each string is placed after #SBATCH verbatim.
slurm:
  partition: gpu
  time: "04:00:00"
  mem: "32G"
  cpus_per_task: 4
  gres: "gpu:1"
  max_concurrent: 8
  extra_args:
    - "--export=ALL"
    - "--exclusive"

discord

Settings for the herd monitor daemon's Discord channel. Required if you want the monitor to be able to talk to you. The token comes from the DISCORD_BOT_TOKEN environment variable — don't put secrets in YAML. Walk through Discord setup once per server to mint the bot and copy the guild ID.

Field Type Default Description
guild_id string unset Discord server ID. Required to enable the channel — without it the daemon runs without notifications.
channel_id string unset Pin to a specific existing channel (skips auto-create).
channel_name string unset Override the sweep-derived channel name. Discord lowercases it automatically; non [a-z0-9-] chars are stripped.
discord:
  guild_id: "1234567890123456789"

static_overrides

A list of extra name=value tokens appended to every trial's override string. Use them for values that are constant across the sweep but differ from your trainer's defaults (dataset paths, fixed seeds, debug flags). The values are passed through verbatim — your launcher is free to parse them however it wants.

static_overrides:
  - "data.path=/scratch/datasets"
  - "trainer.seed=42"

The legacy hydra: { static_overrides: [...] } form still parses for back-compat, but new configs should use the top-level field.

How overrides reach the training command

HyperHerd builds an override string for each trial by combining:

  1. experiment_name=<name> (built from parameter abbreviations, e.g. lr-0.001_opt-adam_bs-64)
  2. Each parameter's name=value
  3. Any static_overrides
  4. Any condition set extras (last → these win)

This string is passed as $1 to your launcher script. For example:

experiment_name=lr-0.001_opt-adam_bs-64 learning_rate=0.001 optimizer=adam batch_size=64 data.path=/scratch/datasets

If you're using Hydra this passes through unmodified. If not, your launch.sh parses these name=value pairs into whatever flags your trainer accepts.

Four environment variables are also exported in the SLURM script:

  • HYPERHERD_WORKSPACE — absolute path to the workspace directory
  • HYPERHERD_SWEEP_NAME — the sweep's name: from yaml (shared across all trials)
  • HYPERHERD_TRIAL_ID — the array task index (same as $SLURM_ARRAY_TASK_ID)
  • HYPERHERD_TRIAL_NAME — the auto-generated per-trial identifier (e.g. lr-0.001_opt-adam_bs-64)

Use these for output directories, wandb run names, logging paths, and the log_result() API.

parameters

A YAML mapping where each key is the parameter name and the value is a specification object. Parameter names should match the Hydra config keys you want to override (e.g. model.learning_rate, training.batch_size). Dotted names work for nested Hydra paths.

Every parameter requires:

  • type — either "discrete" or "continuous"
  • abbrev(optional) a short name used to build experiment_name (e.g. "lr", "opt", "bs"). Defaults to the parameter name itself. Required when the parameter name contains characters outside [A-Za-z0-9._-] (e.g. spaces, /), since the token ends up as a file-path component.
  • default(optional) required when the parameter is not in the grid

Allowed characters in abbrev

Whether implicit (from the parameter name) or explicit, the abbrev token must match [A-Za-z0-9._-]+. Anything else would corrupt the experiment_name directory layout or the space-separated key=value override string.

Discrete parameters

Field Type Required Description
type string yes Must be "discrete".
abbrev string no Short name for experiment naming. Defaults to the parameter name; required when the parameter name contains characters outside [A-Za-z0-9._-].
values list[any] yes List of values to sweep over. Strings, integers, floats, or booleans.
labels list[string] no Per-value short display tokens used in the experiment name. Must be the same length as values, contain unique non-empty strings without /. Required when any value contains / (e.g. file paths).
default any no* Default value. Must be one of values. Required when not in grid.
parameters:
  optimizer:
    abbrev: opt
    type: discrete
    values: [adam, sgd, adamw]
    default: adam
  num_layers:
    abbrev: nl
    type: discrete
    values: [2, 4, 8]
    default: 4
  # Path-like values must declare labels so experiment names stay short.
  pretrained:
    abbrev: pre
    type: discrete
    values: ["/scratch/ckpts/resnet50.ckpt", "/scratch/ckpts/vit_base.ckpt"]
    labels: [resnet50, vit_base]
    default: "/scratch/ckpts/resnet50.ckpt"

The full value is still passed to Hydra as the override; labels only affect the auto-generated experiment_name.

Continuous parameters

Field Type Required Default Description
type string yes Must be "continuous".
abbrev string no param name Short name for experiment naming. Defaults to the parameter name; required when the parameter name contains characters outside [A-Za-z0-9._-].
low float yes Lower bound (inclusive).
high float yes Upper bound (inclusive).
scale string no "linear" "linear" for uniform spacing, "log" for log-uniform. Log scale requires low > 0.
steps integer no 5 Number of evenly-spaced points to discretize into.
default float no* Default value. Must be within [low, high]. Required when not in grid.
parameters:
  learning_rate:
    abbrev: lr
    type: continuous
    low: 1e-5
    high: 1e-2
    scale: log
    steps: 5
    default: 0.001

Log scale discretization. With low: 1e-4, high: 1e-2, scale: log, steps: 3, values are spaced evenly in log10 space: [0.0001, 0.001, 0.01].

Default validation

Discrete defaults must be in values. Continuous defaults must be within [low, high]. Both are validated at parse time.

Complete example

name: resnet_sweep

grid: [learning_rate, weight_decay]

slurm:
  partition: gpu
  time: "08:00:00"
  mem: "32G"
  cpus_per_task: 8
  gres: "gpu:a100:1"
  extra_args:
    - "--export=ALL"

launcher: ./launch.sh

static_overrides:
  - "data.root=/scratch/imagenet"
  - "trainer.max_epochs=90"

parameters:
  learning_rate:
    abbrev: lr
    type: continuous
    low: 1e-4
    high: 1e-1
    scale: log
    steps: 4
    default: 0.001
  optimizer:
    abbrev: opt
    type: discrete
    values: [sgd, adamw]
    default: adamw
  weight_decay:
    abbrev: wd
    type: continuous
    low: 0.0
    high: 0.01
    scale: linear
    steps: 3
    default: 0.001
  batch_size:
    abbrev: bs
    type: discrete
    values: [64, 128, 256]
    default: 128

conditions:
  - name: sgd_no_high_lr
    when:
      optimizer: sgd
    exclude:
      learning_rate: [0.1]

grid: [learning_rate, weight_decay] creates a 4 × 3 = 12 trial grid over those two parameters; optimizer and batch_size stay at their defaults (adamw and 128).