ControlNet & Hires Fix — Control, Detail, and Upscale Without Mush (CloudDock Stable Diffusion)

Why this guide exists

The two most common “anime SD pain points” are: (1) anatomy drift (pose/hands/face change every roll), and (2) mushy upscales (hires fix turns crisp lines into blurry soup).

This page gives you a stable workflow: ControlNet for control + a hi-res ladder for detail. You’ll get cleaner lines, fewer broken hands, and sharp results at higher resolution — without endless retries.

Comparison of a clean hi-res result versus a mushy hi-res result

Same prompt, different strategy: ControlNet + gentle hi-res ladder keeps linework crisp.

TL;DR recipe (copy this first)

If you just want the “works for most anime” setup, start here:

Base pass

Resolution: start around 768×512 (portrait: 832×1216), not 2K/4K.
Sampler / Steps: a modern sampler + 22–30 steps.
CFG: 5–7 (too high can worsen “crispy noise” and hires mush).
Batch: keep it small while tuning (1–2).

ControlNet stack (anime-friendly)

CN #1 Pose: OpenPose / DWpose, weight 0.75–0.95, end 0.75–0.85
CN #2 Depth (optional): MiDaS/Depth, weight 0.45–0.70, end 0.60–0.80
CN #3 Line/Edge (optional): SoftEdge/Lineart, weight 0.35–0.55, end 0.50–0.70

Hi-res ladder (the “non-mushy” way)

Stage A: Hires fix 1.25×–1.35×, denoise 0.25–0.35
Stage B (optional): tiny polish pass (img2img) denoise 0.15–0.25
Never: jump directly to 2× with denoise 0.6+ unless you want a redraw.

VRAM danger hint: Most “OOM” and “mush” starts when you do too much at once: high base resolution + many ControlNets + big hires scale + high denoise + big batch. If you hit problems, reduce in this order: batch → hires scale → # of ControlNets → base resolution.

What is ControlNet?

ControlNet lets you give Stable Diffusion a “map” to follow: pose skeleton, depth, edges/lineart, or other guidance. Instead of hoping the model respects your prompt, you provide a concrete structure it can’t easily ignore.

Pose → locks body position (and can help hands if you feed hand keypoints)
Depth → stabilizes 3D structure (face angle, body volume, background perspective)
Edges/Lineart → keeps clean line boundaries and composition

ControlNet panel in A1111 with enable toggle, preprocessor, model, weight, and start/end sliders

ControlNet in A1111: enable it, choose a preprocessor + model, then tune weight and start/end.

The workflow: control first, detail later

The biggest mistake is trying to get everything perfect in one shot. A clean pipeline is:

Base image: generate at a sane resolution with your prompt (fast iterations).
ControlNet lock: add pose/depth/edge maps to stabilize anatomy and composition.
Hi-res ladder: upscale gently to add detail without rewriting the whole image.
Optional polish: tiny img2img or inpaint only where needed (face/hands).

Base generation result before ControlNet and hires

Base: good composition, but soft details / risky hands.

Final result after ControlNet and hires fix

Final: controlled anatomy + crisp detail without soup.

Layered ControlNet presets tuned for anime

You can run multiple ControlNets at once, but think like a chef: one main ingredient (pose), then light seasoning (depth/edge). If you max everything, the image becomes stiff or fails (or eats VRAM).

Preset 1 — Pose lock (the backbone)

Use this when you care about body position, gesture, camera angle, or consistent composition. It also reduces “random limb invention.”

Preprocessor: OpenPose / DWpose (full body)
ControlNet model: Pose-compatible model
Weight: 0.75–0.95
Start / End: 0.00 → 0.75–0.85
Control mode: Balanced (start here)
Resize mode: “Crop and Resize” if framing matters; “Just Resize” if you must preserve full map
Pixel Perfect: ON (usually safer)

OpenPose map next to the final anime output following the pose

Pose map → final output: same gesture, less random drift.

Preset 2 — Hands (when it’s a close-up)

Hands are hard because they’re small and high-detail. The trick is: don’t over-constrain the whole image just to fix fingers. Use hand guidance only when hands are prominent, or do a targeted inpaint pass later.

Option A (global): Add a second ControlNet focusing on hands (if available)
Option B (recommended): Base + pose ControlNet, then inpaint only hands at the end

Common mistake: Setting hand ControlNet weight too high makes the entire image look “posed / stiff”. If you must run it globally, keep weight around 0.55–0.75 and end around 0.50–0.65.

Before: classic “extra fingers / melted hand” moment.

After: a controlled fix pass (often best via inpaint).

Preset 3 — Face stability (without turning it into a mask)

For anime, “face stability” usually means: eyes stay aligned, nose/mouth don’t drift, and the face doesn’t become waxy after hires.

ControlNet can help by stabilizing structure (depth/edge), but the real win is a gentle hi-res ladder + optional micro inpaint.

If face angle keeps drifting: add a light Depth ControlNet (below).
If lines get mushy: reduce hires denoise and keep scale smaller.
If one eye is wrong: inpaint only the eye region (don’t redraw the whole image).

Preset 4 — Depth (the “anti-weird 3D” stabilizer)

Depth guidance is underrated for anime: it keeps camera perspective and facial plane consistent, especially on 3/4 view faces.

Preprocessor: Depth (MiDaS / similar)
Weight: 0.45–0.70
Start / End: 0.00 → 0.60–0.80
When to use: dynamic poses, strong foreshortening, tricky camera angles

Depth map next to the final output showing stable structure

Depth map → output: fewer “flat face” accidents, better perspective stability.

Preset 5 — SoftEdge / Lineart (light seasoning)

Use edge/line guidance lightly to keep composition and major boundaries consistent. If you crank this too high, the image can look traced or rigid.

Preprocessor: SoftEdge / Lineart (choose based on your map source)
Weight: 0.35–0.55
Start / End: 0.00 → 0.50–0.70

Pro tip: If you already have a strong pose map, keep edge/line weight low. Pose is “structure”, edges are “style boundaries”. Too much of both = stiff.

Making control maps (fast and painless)

You have two common ways to get ControlNet input maps:

From an existing image: upload an image, let the preprocessor generate pose/depth/edges.
From a sketch / reference: use your own rough lineart or a pose reference image.

In A1111’s ControlNet panel, use the preprocessor and click Preview (or equivalent). If the preview looks wrong, the output will look wrong — fix the map first.

Previewing a ControlNet map inside the UI before generating

Always preview your map. Bad map = bad result, no matter how good your prompt is.

Hires Fix: why it becomes mush (and how to stop it)

“Mushy hires” happens when your upscale step is secretly doing a redraw. The usual causes:

Denoise too high (0.5–0.8 = you’re rewriting the image)
Scale too big (jumping straight to 2× or more)
Too few steps in the hires stage (detail never resolves cleanly)
Over-aggressive upscaler + high denoise (over-smoothed lines)
Trying to fix anatomy in hires (use ControlNet / inpaint instead)

Rule of thumb: Hires fix is for detail, not for major changes. If you want a new pose/angle, do it in the base pass (with ControlNet), not in hires.

The non-mushy hi-res ladder (1× → 1.3× → optional polish)

Stage 1 — Base (1×)

Generate a clean base image at a moderate resolution. This is where you solve composition and anatomy.

Suggested sizes (anime): 768×512, 832×512, 768×768, 832×1216 (full body portrait)
Steps: 22–30
CFG: 5–7
ControlNet: pose (and optionally depth/edge) tuned as above

Stage 1: base image. Fix structure here, not during upscale.

Stage 2 — Detail (1.25×–1.35×)

This stage adds texture, hair strands, fabric detail, and sharper edges — while preserving identity and linework.

Upscale factor: 1.25×–1.35× (start at 1.3×)
Denoising: 0.25–0.35 (lower = safer)
Hires steps: 10–16 (don’t starve it)
Upscaler: choose a detail-friendly option (avoid “super smooth” unless you like that style)

Hires fix settings showing scale around 1.3x and denoise around 0.3

Stage 2: keep scale modest and denoise low. This is the anti-soup setup.

Stage 3 — Optional polish (tiny, targeted)

If you still see one problem area (usually face or hands), do a tiny polish pass: img2img with low denoise, or inpaint only the broken part. This is where you “spend” extra compute, not by brute-forcing hires.

img2img denoise: 0.15–0.25
Inpaint: small masks, low denoise, repeat 2–4 times if needed

Clean anime linework tip: If your lines look thick after hires, try lowering denoise first (before changing the upscaler). Thick lines are often “redraw energy,” not the model.

VRAM sanity: read the danger signs before you OOM

VRAM issues don’t only cause crashes — they also cause “silent quality loss”: slower generation, unstable results, and settings that behave differently than expected.

Keep ControlNet count reasonable: start with 1 (pose), add others only if needed.
Keep hires scale modest: 1.3× is the sweet spot for “detail without redraw.”
Batch low while tuning: batch 1 is your friend when testing ControlNet weights.
If you must go bigger: do it in stages, not one giant leap.

Danger hint: If you see mush + OOM risk together, you’re likely doing “too big + too high denoise”. Reduce hires denoise to 0.25–0.35 and scale to 1.25×–1.35× first.

Troubleshooting

“ControlNet does nothing.”

Check Enable is on (sounds silly, happens daily).
Preview the control map — if the map is wrong, the output won’t follow.
Increase weight slightly (e.g., 0.7 → 0.85) and ensure end isn’t too low.
Make sure you selected a compatible ControlNet model for that preprocessor.

“It follows the map but looks stiff / traced.”

Lower weight, or end earlier. For edge/line guidance, keep it light — it’s seasoning, not the whole meal.

“Hands are still broken.”

Don’t brute force hires to fix hands. Use a targeted inpaint pass on hands with low denoise and a hand-friendly negative prompt.

“Hires fix makes it mush.”

Lower hires denoise to 0.25–0.35.
Lower hires scale to 1.25×–1.35× (don’t jump to 2×).
Give hires enough steps (10–16), and avoid extreme CFG.
Fix structure in base pass (with ControlNet), not in hires.

“Out of memory (OOM).”

Reduce in this order: batch → hires scale → number of ControlNets → base resolution. If you still need higher final resolution, do it in multiple smaller steps.

What’s next?

img2img Basics: use low-denoise img2img for gentle polish and safe iterations.
Go to “img2img Basics →”
Model Library: pick ControlNet models/checkpoints that match your anime style.
Go to “Model Library Overview →”

Control the bones. Detail the skin. Don’t cook the soup.

← Back to SD Home