CloudDock Header

CloudDock DeepSpeed 1.0.3

DeepSpeed 1.0.3 is a visibility + reliability upgrade. The Console gets clearer “what is running right now?” feedback, Launcher gains a DS Console status indicator, and the base is rebuilt on Universal Usagi 4.1.5 for stronger safety defaults (high-level only). Plus: JupyterLab environment separation so TensorFlow and PyTorch stop fighting each other.

Console tabs show running-job steps · Launcher: DS Console status indicator · Safer baseline (Universal Usagi 4.1.5) · JupyterLab: PyTorch venv + TensorFlow kernel split
Who is this for? If you run DeepSpeed jobs and want a clean “glance → know” experience (without babysitting logs), 1.0.3 makes status and progress more obvious. If you use both PyTorch and TensorFlow notebooks, the new kernel split avoids dependency collisions.
CloudDock DeepSpeed 1.0.3 overview
Overview: Steps shown directly in Console tabs, Launcher status indicator, safer baseline via Usagi 4.1.5, and cleaner JupyterLab environments.

What’s new in 1.0.3 (vs 1.0.2)

  • Console tabs now display running-job steps: the active job’s step counter is surfaced directly in the tab UI (so you can track progress without switching panels or scrolling logs).
  • Launcher: DS Console status indicator: the Launcher DeepSpeed Console page now shows a clear status light / state indicator (busy/idle + quick health signal at a glance).
  • Universal Usagi 4.1.5 base (security uplift): 1.0.3 inherits stronger safety defaults from the latest Universal Usagi baseline. This is intentionally documented at a high level.
  • JupyterLab environment split: TensorFlow (DS/ML/Kaggle) is separated into its own kernel, while PyTorch stays in its own venv. Result: fewer “it worked yesterday” dependency conflicts.
Note on security details: 1.0.3 includes a significant safety uplift, but specific mechanisms are not listed here by design. Your workflow does not change — you just get a safer baseline.

DeepSpeed Console upgrade: Steps shown in tabs

1.0.3 adds a small but high-impact UI improvement: the Console’s job tabs can now display the current running job’s step count. This is meant to answer one question instantly: “Is it actually moving?”

  • When a job is running: the active tab shows live step updates.
  • When no job is running: tabs remain clean (no noisy placeholders).
  • Fallback behavior: if step is temporarily unavailable, the UI stays stable and does not “blink” aggressively.
DeepSpeed Console 1.0.3: steps shown in tabs
Figure 2 — DeepSpeed Console 1.0.3: job tabs show the running job’s steps for quick progress checks.

Launcher integration: DS Console status indicator

The Launcher DeepSpeed Console page now includes a dedicated status indicator so you can see whether the Console is busy/idle and healthy without opening the full Console UI. This improves “instance navigation” especially when you’re juggling multiple tools.

Launcher: DS Console status indicator
Figure 3 — Launcher: DS Console status indicator (quick glance health + busy/idle signal).
Workflow tip: Use Launcher as your “control tower.” If the status shows busy, jump into Console to view logs/steps. If it’s idle, you can start a new run with confidence (and avoid accidental double-start habits).

JupyterLab: TensorFlow kernel split (no more dependency fights)

DeepSpeed users often overlap with DS/ML/Kaggle workflows. In previous setups, installing or upgrading one stack could break the other. In 1.0.3, JupyterLab is structured so environments remain predictable:

  • PyTorch: stays in a dedicated venv (your PyTorch “daily driver”).
  • TensorFlow (DS/ML/Kaggle): moved into a separate Jupyter kernel.
Why this matters: You can now use both stacks on the same instance without the classic “pip install X → torch/tf breaks” cascade. This also reduces support time because the baseline is more deterministic.

How to select the right kernel

  1. Open JupyterLab.
  2. Create a new notebook.
  3. In the kernel picker, choose:
    • PyTorch kernel/venv for torch + deepspeed workflows
    • TensorFlow (DS/ML/Kaggle) kernel for tf workflows
JupyterLab kernel selection: PyTorch venv vs TensorFlow kernel
Figure 4 — JupyterLab: clean environment split (PyTorch venv vs TensorFlow kernel).

Two ways to run training

1) CLI mode (terminal)

CLI remains the highest-control path. Your scripts, your flags, your configs. 1.0.3 does not remove power-user freedom — it improves visibility around what is running.

# Example (single GPU)
deepspeed --num_gpus=1 train.py \
  --dataset /workspace/data \
  --output_dir /workspace/output \
  --lr 3e-5 --epochs 1 --batch_size 8


# Recommended: keep datasets + outputs in /workspace for clean transfers.

2) GUI mode (CloudDock DeepSpeed Console)

GUI remains the fastest “blank instance → first run” path. In 1.0.3, job tracking is easier because you can see step progress directly in tabs.

DeepSpeed Console 1.0.3: job view with steps and logs
Figure 5 — Console: steps surfaced for the running job, plus the usual logs and health signals.

Recommended folder convention

/workspace/
  train.py
  data/
  output/
  configs/
  notebooks/

Upgrade notes (from 1.0.2)

  • Console tabs show steps automatically: no action needed — just start a run and you’ll see step updates where it matters.
  • Launcher status indicator: if you rely on Launcher as your entry point, you’ll notice DS Console state immediately.
  • Jupyter changes: if you previously installed TensorFlow into your PyTorch environment manually, stop doing that — use the dedicated TensorFlow (DS/ML/Kaggle) kernel instead.
  • Compatibility expectation: existing DeepSpeed scripts should run the same — 1.0.3 focuses on safer baseline + better visibility, not changing how your jobs are launched.
Less guessing. More training. (And fewer broken venvs.)