CloudDock Universal Usagi 4.1 Beta 2

4.1.1 lets you sleep. 4.1 Beta 2 lets you sleep better — one true queue, faster captioning, and AdamW8bit done right.

Overview

Usagi 4.1 Beta 2 is a comfort-and-correctness upgrade focused on the SD Training Center’s real-world behavior: bitsandbytes is back, AdamW8bit is now officially supported (both in Training Center and the A1111 page), queue scheduling is smarter for “VRAM isn’t infinite” reality, and Smart Caption Beta is dramatically faster.

If 4.1.1 was “queue it and sleep,” then 4.1 Beta 2 is “queue it and sleep comfortably” — fewer surprises, more stability, and less waiting.

Training Center 4.1 Beta 2 overview
Training Center — smoother queue behavior, faster captioning, and optimizer support that actually matches the docs.

Highlights

  • bitsandbytes is back: optimizer support returns and is stable again.
  • AdamW8bit officially supported: Training Center supports adamw8bit properly, and the A1111 page also supports it.
  • Smarter queue scheduling: the system no longer “sees any engine free” and blindly dispatches; it behaves with real VRAM constraints in mind.
  • One true job at a time: queue execution is now strict — it runs a single job end-to-end for correctness and predictability.
  • Smart Caption Beta up to 10× faster: from ~5 minutes down to as fast as ~30 seconds in best cases.
  • Important note: to avoid data mismatch, the queue system requires the web page to remain open while running.

bitsandbytes returns — AdamW8bit is officially supported

bitsandbytes support is restored in 4.1 Beta 2, and with it comes a real quality-of-life feature: AdamW8bit is now officially supported in the Training Center.

This also applies to the A1111 page: if you prefer configuring optimizers from the UI side, adamw8bit is supported there too. Bottom line: you can pick adamw8bit without “it exists but doesn’t really work” surprises.

Training Center AdamW8bit support
Training Center — official AdamW8bit support (no more half-supported state).
A1111 page AdamW8bit support
A1111 page — AdamW8bit is supported here as well.

Smarter queue scheduling — real VRAM behavior

Earlier behavior optimized for “keep GPUs busy,” but it could be too optimistic: as soon as any engine was free, the system would dispatch matching jobs — even when many GPUs in the wild don’t actually have enough VRAM headroom for parallel behavior.

In 4.1 Beta 2, queue scheduling is more realistic and safer:

  • Dispatch logic is smarter: it avoids the “any engine free → send job” trap when real memory conditions don’t match the theoretical plan.
  • Strict execution: the scheduler now behaves like a true queue — one job runs at a time, end-to-end.
  • Fewer false starts: less “picked up → fails immediately → weird status” edge cases.
Queue runs one job at a time
One true queue — one running job, the rest queued. Predictable and VRAM-safe.

Smart Caption Beta — up to 10× faster

Smart Caption Beta is significantly faster in 4.1 Beta 2. In best-case scenarios, captioning completes in under ~30 seconds, compared to ~5 minutes previously — a speedup of up to 10×.

This is especially noticeable when you iterate repeatedly on datasets: faster captioning means faster “train → review → tweak → train again” loops.

Smart Caption speed improvement
Smart Caption Beta — faster end-to-end captioning, less waiting.
Smart Caption preview
Caption preview — iterate quickly, validate quickly.

Important note — keep the page open while using Queue

To avoid data mismatch or state errors, the Queue system requires the web page to remain open while it runs.

  • If you close the page, the UI-side state can’t safely guarantee consistency.
  • If you must step away: keep the tab open, dim your screen, and let it run.
  • When you return, you’ll still have your job history and context — but only if the page stayed alive.
Yes, you can sleep. Just don’t close the tab.

4.1 Beta 1 vs 4.1 Beta 2

  • 4.1 Beta 1: “I can finally queue jobs and wake up to results.”
  • 4.1 Beta 2: “The queue behaves like reality, captioning is fast, and optimizers are properly supported.”

Quick Checks

Check GPU activity
watch -n 1 nvidia-smi
Confirm training processes
ps aux | egrep "train|kohya|sd-scripts|accelerate"
If the queue feels “slow”
# 4.1 Beta 2 intentionally runs one job at a time.
# This is for correctness + VRAM safety.
# Use job history + loss curve to verify progress.
Verify bitsandbytes is available
python3 -c "import bitsandbytes as bnb; print('bnb ok:', bnb.__version__)"

Troubleshooting

  • “Why is only one job running?” That’s the new behavior in 4.1 Beta 2 — it’s a strict queue for stability.
  • “Smart Caption still slow sometimes.” Speed depends on dataset size and environment state. Best-case runs can be ~30s, but larger sets will scale.
  • “Queue state looks risky if I close the tab.” Don’t close it. Keep the page open to prevent state mismatch.
  • “AdamW8bit not visible?” Confirm you’re on 4.1 Beta 2 and that your UI is refreshed; then re-open optimizer options.