Skip to content

Disaster Recovery Plan

Last Updated: YYYY-MM-DD Purpose: High-level recovery planning and scenario overview


Recovery Principles

  1. Stay calm - Panic leads to mistakes
  2. Assess first - Understand what's actually broken
  3. Check backups - Verify backups exist before proceeding
  4. Document everything - Take notes as you recover
  5. One thing at a time - Don't change multiple things simultaneously

Disaster Scenarios

Scenario Impact Recovery Doc
Infrastructure server failure All VMs down proxmox-backup-restore.md
Storage server failure All data unavailable truenas-backup-restore.md
Both servers Complete homelab loss See below
Accidental VM deletion Single service down proxmox-backup-restore.md
Service misconfiguration Single service broken Restore from app backup

Pre-Disaster Checklist

Do these NOW, before you need them:

Critical (Must Have)

  • [ ] This documentation accessible from multiple locations
  • [ ] Backup verification performed in last 30 days
  • [ ] Recovery procedures tested at least once
  • [ ] Emergency credentials stored securely (password manager)
  • [ ] Restore scripts saved offline (USB, cloud storage)

Important (Should Have)

  • [ ] VPN account recovery methods set up
  • [ ] Cloud provider account recovery methods set up
  • [ ] Offsite backup tested
  • [ ] OS install media on bootable USB

Catastrophic Loss (Both Servers)

Scenario: Fire, flood, theft - both servers destroyed

This will take days. Accept it. Don't rush.

Phase 1: Get Infrastructure Running (Day 1)

  1. Obtain replacement hardware
  2. Install hypervisor -> proxmox-backup-restore.md
  3. Install storage OS -> truenas-backup-restore.md
  4. Basic network configuration

Phase 2: Restore Critical Services (Day 1-2)

Priority order: 1. Storage - Need this for everything 2. DNS - Network functionality 3. Remote access - Work capability 4. Monitoring - Visibility

Phase 3: Restore Data (Day 2+)

  1. Pull data from offsite backups (may take days for large datasets)
  2. Prioritize: Photos > Documents > Media
  3. Let it run in background

Phase 4: Applications (Day 3+)

  1. Reinstall apps once data is restored
  2. Restore configurations
  3. Test functionality

Recovery Service Priorities

When recovering multiple services, restore in this order:

  1. DNS - Network needs name resolution
  2. Home Automation - Safety and daily routines
  3. Monitoring - Need visibility into health
  4. Remote access - Work remotely
  5. Everything else - Docker hosts, media, etc.

Testing Schedule

Monthly

  • [ ] Verify backups running
  • [ ] Verify offsite sync completing
  • [ ] Check storage space usage

Quarterly

  • [ ] Test VM restore from backup
  • [ ] Test file restore from offsite
  • [ ] Verify documentation is current

Annually

  • [ ] Full disaster recovery simulation
  • [ ] Update all documentation

Recovery Log Template

Use this when performing actual recovery:

Date: ___________
Scenario: ___________
Cause: ___________

Timeline:
- Event discovered: ___________
- Recovery started: ___________
- Services restored: ___________
- Full recovery: ___________

What Worked:
-

What Didn't Work:
-

Lessons Learned:
-

Documentation Updates Needed:
-

Recovery Complete Checklist

Recovery is NOT done until: - [ ] All critical services operational - [ ] Data integrity verified - [ ] Backups resuming automatically - [ ] Monitoring operational - [ ] Documentation updated with lessons learned



Emergency Resources

  • Proxmox Community: https://forum.proxmox.com
  • TrueNAS Community: https://forums.truenas.com
  • r/homelab: https://reddit.com/r/homelab