Training Center — Tuning & Troubleshooting (CloudDock)

Quick checklist

Clean data: one subject/style per folder; remove blurs/near-dupes.
Family match: SD 1.5 vs SDXL must match your base checkpoint and your generation plan.
System headroom: check Launcher → System for VRAM & disk before training.
A1111 only for auto-caption: run it for captioning, then stop to free VRAM.

System panel — confirm VRAM/disk headroom before you hit Start.

Family mismatch symptoms — muddy results or model not appearing.

Lower resolution (long side): first & biggest lever.
Batch = 1; increase grad-accum to simulate larger batch.
Keep precision FP16 from presets; avoid exotic half-modes unless you know them well.
Close other GPU apps; only keep A1111 on for captioning.
If still OOM: shorten steps for a test pass, then resume longer.

Lower resolution → batch 1 → grad-accum — safest path.

Batch stays small; use accumulation to approximate larger effective batch.

Portrait LoRA: start long side ~640–768 px.
Full-body DB: start with a taller canvas (e.g., 832×1216 class) but keep moderate first.
Normalize sizes to reduce gradient noise from extreme aspect shifts.

Presets ship stable combos. If you must tweak:

Start modest; evaluate mid-run samples if enabled.
Use Resume from last good step instead of restarting from scratch.
Overtraining shows as oversaturated, waxy faces, or collapsed diversity — stop earlier next time.

Resume picks up from the last saved step.

Take screenshots for your job config. Keep it with your outputs.

Keep config screenshots with the artifact for exact repro.

Version your models — it saves you from “which one was good?”.

LoRA: refresh the LoRA list; insert <lora:NAME:0.6–0.8> and iterate.
DreamBooth: refresh the Stable Diffusion checkpoint list; keep family consistent.
ControlNet: layer it after selecting your checkpoint if needed.

Refresh model lists in A1111 to pick up new outputs.

Compare weights/steps side-by-side to find the sweet spot.

Change one knob. Test. Repeat.