A backup that has never been restored is a theory. You have file storage and a backup software report. You do not have a recovery capability until you have proven the restore works — on an isolated server, with the full chain, verified by DBCC CHECKDB, confirmed by an application connection.
If the last time you ran that test was more than 30 days ago on your critical databases, you do not know whether your backup will restore today. Databases change. Backup jobs change. Certificates get rotated. Log chains break. The gap between “backup completed” and “backup restores” is where disasters live.
Why Backups Fail When You Need Them
Backup software reports success at the file level. It confirmed the backup file was written to the destination. It does not verify the data pages inside are consistent. It does not verify the log chain is intact. It does not verify the TDE certificate that unlocks the database is still available and matches the backup.
Backup jobs drift. A database is added to an Always On Availability Group and silently excluded from the backup job because the secondary replica was taking backups — but the secondary replica was decommissioned last quarter. A retention policy change causes the full backup that anchors the differential chain to age out. A disk fills, the job continues, the final log backup is written to a second location that nobody monitors.
When a secondary replica is taking the backups, those full database backups are marked COPY_ONLY. A COPY_ONLY full backup does not set the differential base — SQL Server does not update the differential bitmap when it runs. Differential backups cannot be chained off a COPY_ONLY full, and differential backups from a secondary replica are not supported in SQL Server 2022 and earlier. Your restore chain becomes the COPY_ONLY full plus every transaction log backup taken since it ran. For a large, active production database, that can mean replaying days or weeks of logs during recovery — significantly extending your RTO at exactly the moment you cannot afford it. Note: SQL Server 2025 introduces full and differential backup support on secondary replicas. If your environment is on 2022 or earlier, this limitation applies today.
None of these failures announce themselves. They wait for the moment you most need a clean restore.
RESTORE VERIFYONLY Is Not a Restore Test
Many organizations run RESTORE VERIFYONLY as their backup verification process. It reads the backup header, checks the media structure, and confirms the file format is valid.
It does not apply data pages. It does not verify logical consistency. A database with page-level corruption can produce a backup file that passes VERIFYONLY and fails on RESTORE with an 824 error. A broken log chain passes VERIFYONLY on each individual file and fails when you attempt to apply the chain in sequence.
RESTORE VERIFYONLY tells you the file is formatted correctly. It does not tell you the file contains a working database.
What a Real Restore Test Looks Like
Step 1. Identify the database to test. Start with your most critical production database.
Step 2. Stand up an isolated server — your Isolated Recovery Environment (IRE). It must not be connected to production storage, must not share domain credentials with your production environment, and must not have access to your primary backup storage location. If the server can reach your production network, the test environment is not isolated enough for ransomware recovery validation. The IRE concept scales from a single test restore to a full enterprise recovery operation — the principles are the same regardless of scope.
Step 3. If the database uses TDE, restore the certificate first. Restore the Service Master Key if needed, then the Database Master Key, then the certificate. The database backup will not open without it.
Step 4. Restore the backup chain. Full backup with NORECOVERY, then differential if applicable with NORECOVERY, then each log backup in sequence with NORECOVERY, then the final log backup with RECOVERY. If any step fails, you have found a gap before a ransomware event did.
Step 5. Run DBCC CHECKDB with no repair options. You are not here to hide problems. You are here to find them. Allocation errors, consistency errors, and page corruption surface here. If CHECKDB returns errors on a production database, that is a critical finding independent of any ransomware concern.
Step 6. Confirm an application connection. Your application should be able to connect to the restored database and read data. This validates not just the restore but the database’s logical integrity from the application’s perspective.
Step 7. Document what you tested. Date, person, databases, any findings, and resolution steps. If you cannot produce this record, the test did not happen for compliance purposes.
To run a full simulated IRE — not just a database restore but a complete server rebuild from scratch — add one more check: confirm you have the SQL Server unattended install files (ConfigurationFile.ini) and current scripts for instance configuration, logins, and Agent jobs, all stored in immutable storage. Your database backup restores the database. It does not restore SQL Server itself, sp_configure settings, logins in master, or Agent jobs in msdb. If those artifacts do not exist or have not been updated since the last major environment change, your restore test passed and your recovery plan still has a gap. Post 4 in this series covers what those artifacts are and why each one is required.
The 30-Day Standard and Why It Exists
Ransomware groups that target enterprise environments dwell an average of 21 to 30 days before detonating. A restore test that is 60 days old may not cover the current backup state — jobs may have changed, databases added, certificates rotated in the gap between your last test and today.
The 30-day standard for critical databases means your most recent verified restore is never older than one typical attacker dwell window. For standard production databases, quarterly is the minimum acceptable frequency. For non-critical systems, semi-annual — but those definitions need to be explicit and documented, not assumed.
When the Test Fails
It will fail at some point. That is the entire purpose of testing before a ransomware event forces you to find out.
Broken log chain. The differential references a full backup that aged out of retention. The backup job completed successfully every night. The restore fails because the anchor is gone. Fix: adjust retention so the full backup outlives the differentials that depend on it.
Missing TDE certificate. The certificate was rotated six months ago. The certificate backup was not updated. The backup file restores and the database will not open. Fix: update certificate backups every time the certificate changes. Store them in a separate, offline location.
Database excluded from the job. A new database was added to the instance and never added to the backup job. The job has been running and excluding it silently. Fix: audit your backup jobs against the list of production databases monthly, not annually.
Incomplete backup due to storage full. The backup destination filled. The job continued writing until it failed mid-stream. The last backup on disk is incomplete. Fix: monitor backup destination storage as a critical metric, not an afterthought.
Making the Test Repeatable
A restore test that only one person knows how to run is a liability, not an asset. Write a procedure — not a checklist. A procedure says what to do when a step fails, who to notify, and what constitutes a passed test versus a conditional pass.
Schedule it at the cadence that matches the tier. Critical databases: restore test every 30 days. Standard production databases: quarterly at minimum. Each test needs an owner and a backup owner named in advance.
Two additional exercises belong on the calendar beyond the individual restore test. Quarterly: a tabletop exercise with all stakeholders — systems administrators, network team, and backup application team — to walk through the full IRE process end to end. Every six months: a full IRE build-out simulation that includes the server rebuild from ConfigurationFile.ini, SQL Server configuration restore, and complete database recovery chain from immutable storage.
If any of these exercises are skipped, that skip is a finding. If skipped twice, it is a control failure.
Keep a restore test log. Date, tester, databases covered, findings, resolution, sign-off. That log is your evidence during an audit that your backup and recovery controls are operational — not just documented.
If your restore process has never been documented or tested, or if the last test found problems you have not resolved, the SQL Server Security Discovery Call is where that conversation starts.
