CloudDock Header

CloudDock DeepSpeed 1.0.1

Train real PyTorch/DeepSpeed workloads on CloudDock — with both CLI freedom and a clean GUI console. Switch from CloudDock Universal seamlessly (SD entry removed: A1111 / SD Training Center live in Universal).

CLI + GUI in one image · Live logs & loss curve · Bring your own script + data
Who is this for? If you want “a real training environment” (not only click-to-generate images), DeepSpeed is your home. Beginners can start from the GUI; power users can go full CLI anytime.
CloudDock DeepSpeed 1.0.1 overview
Overview: CLI training + GUI console with real-time logs and a live loss curve.

What you get

  • DeepSpeed CLI: run training exactly the way you do on bare metal or a server.
  • CloudDock DeepSpeed Console: upload script + dataset, tweak common knobs, start/stop/resume, watch logs, and view a live loss curve.
  • Launcher integration: switch between Universal and DeepSpeed with a consistent CloudDock workflow.

Two ways to run training

1) CLI mode (terminal)

Use CLI when you want maximum control (custom args, configs, distributed settings, debugging). The container ships with DeepSpeed ready — you focus on your code and data.

DeepSpeed CLI running inside CloudDock
Figure 1 — DeepSpeed running in terminal: step logs + loss streaming.
# Example (single GPU)
deepspeed --num_gpus=1 train.py \
  --data_dir /workspace/data \
  --output_dir /workspace/output \
  --lr 3e-5 --epochs 1 --batch_size 8


# Tip: keep your dataset & outputs in a mounted drive / your own storage when possible.

2) GUI mode (CloudDock DeepSpeed Console)

Use the console when you want a fast “from blank instance to first run” workflow: upload your script + data, adjust common parameters, and monitor progress with real-time logs and a loss curve.

DeepSpeed Console logs and metrics
Figure 2 — Console: state, step, loss, log tail, and a live loss curve.
DeepSpeed Console start run panel
Figure 3 — Console: upload script/data, tweak parameters, then Start run.

From blank instance to first run

  1. 1Launch DeepSpeed from CloudDock Launcher.
  2. 2Open the DeepSpeed Console (GUI) or Terminal (CLI).
  3. 3Upload your training script (e.g. train.py).
  4. 4Upload your dataset to a known folder (e.g. /workspace/data).
  5. 5Set basic knobs (batch size / lr / epochs / precision / output dir).
  6. 6Start run and monitor logs + loss curve.
Tip: If you’re new, start with the console first. Once you know your script is correct, switch to CLI for deeper tuning and automation.

What you can monitor

  • State: running / finished / stopped
  • Step & loss: current training step and loss value
  • Log tail: live log stream (you can copy and paste for debugging)
  • Loss curve: recent points streamed live for quick sanity check
Reminder: If your loss is nan or explodes quickly, common causes are: wrong dtype/mixed precision config, too high LR, unstable batch size, or bad data. Start small, verify one run, then scale up.

Basic folder convention (recommended)

/workspace/
  train.py
  data/
  output/
  configs/
Ship it. Then measure it. Then ship it again.