CloudDock Header

In-house ops & security

24×7 on-call, spare-parts pool, isolated management network. Built to stay up.

Operations & Security overview

What this means

We run our own ops and security stack end-to-end. No “best effort”, no black boxes. The result is predictable capacity, fast recovery, and tight control of blast radius.

Core pillars

  • Always-on: 24×7 on-call with clear SLOs and runbooks.
  • Fast MTTR: cold/hot spares on-site, validated swap procedures.
  • Isolation-first: management plane on a separate, locked-down network.

Operations

24×7 on-call

Rotating primary/secondary, live dashboards, paging policies tuned to severity. Post-incident reviews drive concrete fixes.

Spare-parts pool

Critical SKUs stocked on-site (PSU, fans, NICs, SSDs, GPUs). Swap time measured, rehearsed, documented.

Change management

Maintenance windows, staged rollout, canaries, rollback-by-default. Infra as Code for reproducible state.

Backups & recovery

Scheduled snapshots, encrypted at rest; periodic restore drills validate RTO/RPO assumptions.

Security

Isolated management network

Air-gapped from tenant traffic. Admin access gated behind MFA, short-lived credentials, and bastion policies.

Segmentation & least privilege

Per-tenant isolation, scoped service accounts, audited privilege elevation.

Defense-in-depth

WAF, DDoS protection, egress controls, image signing, and runtime checks. Supply-chain pinning for critical components.

Monitoring & audit

Metrics, logs, and traces with retention; tamper-evident audit trails; alerting on anomaly and policy violations.

Service posture

  • Designed for uptime: capacity headroom plus spares beats wishful thinking.
  • Small blast radius: failure domains are contained by design.
  • No guesswork: documented runbooks, tested procedures, measured outcomes.

FAQ

Do you rely on third parties for ops?

We operate in-house. Vendors exist, but control, paging, and recovery remain our job.

How do you handle hardware failures?

Detect, isolate, swap. Spares are stocked; swaps are rehearsed; tenants are migrated with minimal disruption.

Is the management network exposed?

No. It’s segregated from tenant traffic and gated behind MFA and short-lived credentials.

If there is something going on, I will be here. If there is nothing going on, I will sleep :)