ISO 22301:2019 — Business Continuity
Status: not pursuing in 2026 · reassessed quarterly · last updated 27 May 2026
Glassbreak is not currently ISO 22301 certified and is not at this time pursuing certification. This page sets out our reasoning, our current posture against the BCMS clauses (4–10), the recovery objectives we operate to, and the trigger conditions that would cause us to start.
Our position
ISO 22301:2019 is the international standard for a Business Continuity Management System (BCMS). A certified BCMS demonstrates that an organisation has identified the products and services it must protect, analysed the impact of disruption, set recovery objectives, designed continuity strategies and procedures, and operates the management cycle (plan-do-check-act) to keep those arrangements effective.
For an emergency-communications platform, business continuity is an unusually material concern: customers reach for Glassbreak precisely when other things have gone wrong. The technical foundations are accordingly more mature than the management-system documentation. We have built a tested multi-cloud topology, twenty-two scripted DR scenarios, and an independently-verifiable status surface — what we do not yet have is a third-party-audited BCMS conforming to clauses 4–10.
For an organisation of our current size, ISO 22301 is a significant investment (typically £30,000–£90,000 in the first year including BCMS implementation, internal audit, and the Stage 1 + Stage 2 certification audit) and a multi-quarter time commitment from leadership. Much of the underlying continuity work overlaps with ISO 27001 Annex A.5.29–5.30 (ICT readiness for business continuity) and SOC 2 Type II Common Criteria CC9.1, so we expect the marginal cost to be materially lower once those are in place.
When we will start
We will begin ISO 22301 certification work when any one of the following becomes true:
- a regulated customer (financial services, healthcare, government, critical national infrastructure) requires it in writing as a procurement gate, or
- EU-headquartered ARR exceeds £750,000 annualised AND at least one EU customer requests it, or
- our first ISO 27001 certificate has been issued (the marginal cost to add ISO 22301 once an ISMS is operating is typically one quarter and one auditor selection rather than a full cycle), or
- a UK CNI customer requires evidence under NIS2 / UK NIS regulations.
Recovery objectives we operate to today
These are our stated recovery-time and recovery-point objectives per product surface. They are commitments, not certifications. Each is exercised at least quarterly via the corresponding DR scenario (see the table at the foot of this page).
| Ref | Requirement | Status | How we meet it / gap |
|---|---|---|---|
| API — primary | Customer-facing API on the primary domain | Met | RTO ≤ 60 seconds (automatic Fastly origin failover to a healthy compute vertical). RPO = 0 (the request that triggers failover may need to retry; no data loss). Tested in DR scenarios 1–5 (vertical-down) and 9 (region partition). |
| API — direct | Direct-access API hostnames (.dev, .com) | Met | RTO ≤ 5 minutes (DNS TTL on the failover record). RPO = 0. The .io primary remains the fast path; direct hostnames are a fallback for clients that pin to a specific cloud. |
| Web — static | Marketing + auth pages (statically exported) | Met | RTO ≤ 60 seconds; objects served from two independent origins (AWS S3 + Scaleway Object Storage) behind Fastly. RPO = 0 — the bundle is reproducible from the commit SHA. |
| Web — authenticated | In-product application | Met | RTO ≤ 5 minutes. Static-export bundle plus API failover; the client recovers as soon as either vertical returns 200 on /api/public/status. |
| Database — primary | Postgres reads + writes | Partial | RTO ≤ 30 minutes (managed-Postgres standby promotion). RPO ≤ 1 minute (continuous WAL replication). Cross-vertical async sync mesh keeps secondary instances within ~60s of primary; full cutover documented but not yet drilled to a hard SLA. Tested in DR scenarios 8, 11, 17. |
| Database — backup restore | Point-in-time recovery from PITR backup | Met | RTO ≤ 4 hours (full restore of last good backup). RPO ≤ 5 minutes (PITR granularity). Backup integrity verified nightly via DR scenario 22 — a sample row is decrypted from the restored backup and compared against the live row. |
| Emergency message — public ack | /ack/[token] endpoint (the path recipients hit during an incident) | Met | RTO ≤ 60 seconds; ack endpoint is on the same Fastly fail-over path as the primary API. Token validation + ack write are short, so a single retry crosses the failover window. Tested in DR scenario 7. |
| Secrets manager (org admin) | Vault secret create / approve / decrypt | Partial | RTO ≤ 5 minutes for read paths. The Shamir-share approval flow has a single-vertical dependency on the primary API today; cross-vertical share retrieval is on the roadmap. Workaround: any approver with an in-flight share can complete the flow from a different surface. |
Status by BCMS clause (4–10)
The main body of the standard requires the BCMS itself — the management spine that connects technical capability to a documented, audited operating cadence. The technical capability exists; the documented management system is the gap.
| Ref | Requirement | Status | How we meet it / gap |
|---|---|---|---|
| Clause 4 | Context of the organisation, interested parties, BCMS scope | Partial | Scope documented at /policies/bcp (covers customer-facing surfaces); interested parties identified informally; controlled scope-document with formal review cycle not yet in place. |
| Clause 5 | Leadership, policy, roles and responsibilities | Partial | Business Continuity Policy published at /policies/bcp with a named accountable owner. Top-management commitment + resourcing documented; formal RACI for crisis roles not yet finalised. |
| Clause 6 | Planning — risks, opportunities, business-continuity objectives | Partial | Continuity objectives stated as RTO/RPO per surface (table above); residual risks captured in /policies/risk-treatment-plan; quantitative likelihood/impact scoring not yet recurring. |
| Clause 7 | Support — resources, competence, awareness, communication, documented information | Partial | On-call rota staffed; runbook training is on-the-job; crisis-comms templates exist for customer notifications but the cadence of awareness sessions is not yet annual + recorded. |
| Clause 8.2 | Business Impact Analysis (BIA) | Partial | Surface-level BIA implicit in the RTO/RPO table above and in docs/architecture.md §4 (Failure matrix). Formal BIA workbook with prioritised activities, MTPD/MBCO, dependencies — not yet a controlled document. |
| Clause 8.3 | Risk assessment for continuity | Partial | Threat scenarios identified in docs/architecture.md §4 + tested in twenty-two DR scenarios. Formal risk register tying scenarios to BIA priorities and treatment plans is in progress. |
| Clause 8.4 | Business continuity strategies and solutions | Met | Multi-cloud, multi-region active-active topology documented at docs/architecture.md §§1–4. Cross-vertical sync mesh, dual DNS authority, dual TLS termination, dual web origin. Each leg has a documented strategy: failover, restoration, alternate-comms, supplier-fallback. |
| Clause 8.5 | Business continuity plans and procedures | Met | Per-scenario runbooks under docs/runbooks/. Crisis comms template at /policies/procedures/customer-notification. Roles, decision criteria, escalation paths documented for SEV-1/2/3/4 in /policies/incident-response. |
| Clause 8.6 | Exercise programme | Met | Twenty-two DR scenarios scripted in .github/workflows/dr-tests.yml; nightly cron run at 00:30 UTC; full exercise log + remediation tracker. Tabletop exercises with the founding team are documented quarterly. |
| Clause 9 | Performance evaluation, internal audit, management review | Partial | Daily security-snapshot covers continuity controls (audit-log freshness, backup integrity, DR-scenario pass-rate). Internal-audit programme exists at /policies/cadences/internal-audit-programme; full BCMS audit not yet a recurring scope. |
| Clause 10 | Improvement — nonconformity, corrective action, continual improvement | Met | Post-incident review template at /policies/incident-response §11. DR-scenario failures auto-open corrective-action tickets with time-bounded SLAs at /policies/cadences/remediation-slas. Disclosure-programme findings feed the same workflow. |
Technical continuity evidence — what we can already show
Even without certification, the BCMS clauses that depend on technical evidence are well-covered. The gap is documentation and audit cadence, not capability.
| Ref | Requirement | Status | How we meet it / gap |
|---|---|---|---|
| Multi-cloud | No single-cloud failure can take production down | Met | AWS Lambda (Function URL) + Scaleway Functions, both serving the same API behind Fastly with health-checked failover. Independent compute, networking, identity, and TLS termination per vertical. Architecture: docs/architecture.md §§1a, 2. |
| Multi-region | No single-region failure can take production down | Met | Verticals run in distinct regions on distinct continents (us-east-1 + fr-par). Synthetic monitoring from three probe regions (EU-Ireland, US-East, AU-Sydney). |
| Multi-DNS | No single DNS provider failure can prevent resolution | Met | Dual-DNS-authority per domain: glassbreak.io uses DNSimple + deSEC; glass-break.com uses Route 53 + Porkbun; glassbreak.dev uses Scaleway DNS + deSEC. Registrar diversity policy at docs/architecture.md §3a. |
| Multi-edge / CDN | No single CDN failure can prevent customer access | Partial | Fastly is the primary edge; direct-access hostnames (api.glassbreak.dev, api.glass-break.com) bypass the CDN entirely. A Fastly outage degrades the .io path; direct domains continue serving. Web has dual-origin (S3 + Scaleway Object Storage) behind Fastly. |
| Backup integrity | Backups are recoverable and verified | Met | Nightly DR scenario 22: restore last good backup to an isolated environment, decrypt a sample row, compare against live. Failure opens a SEV-2 ticket automatically. |
| Crypto-key continuity | A vertical loss does not lock customers out of their data | Met | Per-vertical EdDSA JWT signing keys + shared verify-keys map: tokens minted by one vertical verify on any other (architecture §6a). Cross-vertical share retrieval planned for secrets (currently single-vertical dependency). |
| DR exercise log | Exercises are tested, dated, and outcomes recorded | Met | twenty-two scenarios in .github/workflows/dr-tests.yml run nightly; results land in the daily security-snapshot. Each scenario has a passing run within the last 24 hours surfaced on this page once green for 30 consecutive days. |
| Public status surface | Customers can see degradation in real time | Met | Public status page at /status reflects live per-vertical health from /api/public/status. Synthetic-monitoring probes from three regions; alerts wired to email + (optional) Slack via Grafana Cloud. |
| Customer-notification template | Crisis communications can go out within the documented SLA | Met | Templates at /policies/procedures/customer-notification with SEV-1 (≤ 1h), SEV-2 (≤ 4h), SEV-3 (≤ 24h) notification SLAs. Glassbreak itself is the notification platform — customers receive crisis comms via the same product. |
| Supplier continuity | Critical sub-processor failure does not stop core function | Partial | Multi-cloud removes the largest single-supplier risk. Stripe (billing), Sendgrid (transactional email), and the IdP for SSO are single-supplier today; alternative routes documented but not wired. See /policies/supply-chain-risk for the per-supplier continuity stance. |
What ISO 22301 certification would add
Most of what 22301 audits we already do. Certification would add:
- An auditor-validated Business Impact Analysis (we have the inputs; the document is not yet controlled).
- A formal exercise programme calendar with documented objectives, scenarios, participants, and corrective-action closure — moving the DR scenarios from a nightly CI artefact into a controlled programme document.
- A management-review minute trail (currently informal).
- A surveillance audit each year confirming the BCMS continues to operate effectively — independent assurance for procurement teams who cannot evaluate technical evidence directly.
Estimated effort to certification (if triggered)
- Months 0–2 — BCMS implementation. Hire or contract an ISO 22301 Lead Implementer; draft policies, formal BIA, risk-treatment plan, exercise programme. Pull existing technical evidence into the controlled-document register. Estimated effort: 0.3 FTE for two months.
- Months 2–4 — Operate the BCMS for at least one full internal-audit cycle; run two exercises (one discussion-based, one operational); run a management review; remediate nonconformities.
- Month 4 — Stage 1 certification audit (documentation review).
- Month 5 — Stage 2 certification audit (operating effectiveness).
- Month 6 — Certificate issued (three-year certification cycle with annual surveillance audits).
If ISO 27001 is already in place when 22301 work starts, the gap is materially smaller: the ISMS + BCMS share clauses 4–10 substantively, and Annex A.5.29–5.30 in ISO 27001 already requires most of the BCMS technical evidence. Realistic incremental effort: one quarter and one auditor selection rather than a full six-month cycle.
If ISO 22301 is a hard procurement requirement for your organisation, please write to compliance@glassbreak.io with your expected ACV, contract term, and the regulatory regime driving the requirement so we can prioritise accordingly.