The Steps Nobody Tells You About: The Recovery Workflow Ransomware Forces You to Follow

It’s 2am on a Saturday morning of a long holiday weekend. Your phone won’t stop. Database servers are down and won’t restart. You reach for the backups. Someone encrypted them three days ago — you just didn’t know it until now. Everything you thought was your safety net is gone! What happens next depends entirely on whether you built and implemented the right process before this night happened. Your job — and possibly your career — may depend on the answer.

Studies report that up to 75% of organizations that believe they’ve recovered from ransomware get hit again. The ones who don’t — they did not restore to their old servers. They did not restore into their existing environment. They started clean. And they were ready to do it — because they had built the workflow, tested it, and simulated it before the night it mattered. Here is what that workflow looks like.

The Preparation That Makes All of This Possible

Everything in Steps 1 through 5 describes what happens the night ransomware hits. None of it works the way it needs to unless the groundwork was laid long before that night arrived.

The first time your team works through an IRE recovery, it will be slow. There will be gaps in the runbook, missing credentials, unclear hand-offs, and things that only break when they are actually tested. That is exactly why you do it before the pressure is real. Document everything that broke, update the runbook, and run through the process again. The second run is faster. The third run starts to look like your actual RTO.

Before any live drill, start with a tabletop exercise — and do not run it with just the data team. IT leadership, security, operations, legal, communications, and the database team all need to be at the table. Walking through the workflow as a group surfaces the dependencies and hand-offs that never show up in a document. You find out which decisions require sign-off from someone who was not in the room. You find out which teams need to be activated before the database team can move. Those conversations cost very little before an incident. During one, they cost everything.

Then, on a quarterly basis, run a live simulation — an actual IRE build and restore drill, not a conversation. Certificates get rolled. Backups get tested against the current certificate chain. Scripts get validated. The runbook gets updated to reflect anything that has changed since the last drill. Quarterly matters because environments change: new databases, new applications, rolled certificates, updated backup configurations. A runbook that was accurate six months ago may have gaps today. Find those gaps in a drill. Not during the incident.

Step 1: Build the Green Field IRE

IRE stands for Isolated Recovery Environment. Nothing in it connects to your existing network. No existing servers. No existing scripts. No configuration management tools — Chef, Ansible, Puppet, all potentially compromised. Everything built from verified, clean ISOs from scratch. This is called a Green Field build. If it touched your production network before the attack, it does not exist in the IRE. Period.

Immutability applies here too. Every script, every database backup used in the IRE must come from a known-good, protected source. That includes SQL Server itself — install it using the unattended installation configuration files you previously pulled from your production servers and stored in immutable storage. Matching the install configuration matters: collation, instance settings, feature selection, and service account configuration all need to align with what your databases expect. You are not just isolating the environment — you are guaranteeing everything inside it is clean and correct before you introduce any recovered data.

One more layer that catches teams off guard: if your backups were written to immutable storage through a third-party backup product — Veeam, Commvault, Rubrik, or similar — that product likely added its own encryption on top of your backup files. That encryption is entirely separate from your SQL Server TDE certificate chain. It is tied to the backup vendor’s own certificate.

To read those backups inside the IRE you need two things: a clean installation of that backup product built from a verified ISO, and the certificate it used to encrypt your backups — stored in your immutable storage alongside the backup files themselves. Without both, the files are unreadable regardless of whether your SQL Server certificate chain is perfectly intact. Verify this as part of your restore testing before the incident. Discovering it at 2am is too late.

Step 2: Restore the User Database Into the IRE

Now you restore the user database — but not to your production servers. Into the clean IRE. The backup must come from immutable storage. The TDE certificate chain must be intact if the database is encrypted. Confirm backup integrity before you touch anything. Only then do you attempt the restore.

If the restore fails due to certificate mismatch — the certificate that protected that backup no longer exists — do not stop. Pull the next oldest full backup and attempt the restore again. Keep looping back in time until you find a backup that can be restored. The attacker’s certificate cycling has a start point. Somewhere in your backup history is a full backup taken before that window opened. That is your recovery anchor.

And here is the troubling part: if you have not been auditing and alerting on TDE key chain changes, you may not know how far back that window goes. It could be days. It could be weeks. It could be months. The longer an attacker was inside before detection — unaudited and undetected — the further back you may have to go to find a clean backup. That is not a recovery. That is data loss measured in time.

Then you do it again. This entire process — Steps 2, 3, and 4 — repeats for every user database, one at a time. Each database has its own backup history, its own certificate chain, its own potential compromise window. There are no shortcuts.

Once restored, apply SQL logins — but do not connect to Active Directory, and do not build one from its backups. In a ransomware scenario your AD environment is potentially compromised, and an AD restored from backup carries the same risk as the original. Either path re-introduces everything you are trying to leave behind. Connecting the IRE to AD — or to an AD rebuilt from a backup of AD — defeats the entire purpose of isolation. If your environment uses only AD-integrated authentication, create equivalent SQL Server logins with the same permissions in the IRE instead.

Those login scripts — permissions intact, roles mapped, access levels documented — should have been scripted out, stored in immutable storage, and kept current before this night happened. If they weren’t, this step becomes a manual reconstruction under pressure at 2am.

Step 3: Check for Execution Code Inside the Database

This is where most organizations stop short — and where attackers count on them stopping short. Most endpoint protection and antivirus products scan files on disk. They do not see inside the SQL Server engine. A restored database can look completely clean to your security tools and still be carrying a payload.

Before you declare any restored database clean, you must manually audit everything that can execute code from inside the engine:

CLR Assemblies — .NET code loaded into SQL Server with UNSAFE or EXTERNAL_ACCESS permissions
Extended Stored Procedures — legacy xp_ procedures that can execute operating system commands directly
SQL Server Agent Jobs — scheduled jobs that can re-execute a payload after the restore completes
Triggers — DML and DDL triggers that fire automatically on data changes
Linked Servers — connections to other systems that could re-introduce compromise the moment the database goes live
Service Broker — asynchronous execution that can fire code without a visible user session

Your antivirus will not flag any of these. You have to look for them yourself.

Step 4: Validate the Data Itself

A successful restore does not mean that the data itself is clean. Ransomware operators frequently corrupt or manipulate data before the encryption payload fires — sometimes weeks before the visible attack. Run integrity checks. Validate critical tables. Compare to known-good baselines where you have them. Do not assume the backup is clean because the restore completed without errors.

One more note on timing: databases that belong to the same application should be restored to the same point in time wherever possible. Restoring an orders database to Tuesday while its customer database lands on Thursday will produce mismatched foreign keys, broken transactions, and corrupted application state. Identify your application database groups before the incident and document which databases must move together.

After All User Databases Are Restored: Rebuild the System Databases From Script

Only after every user database has been through Steps 2, 3, and 4 do you turn to the system databases. These are never restored from backup. They are rebuilt from scratch using scripts stored in immutable storage before the attack.

master — Contains server-level logins, linked server definitions, server configuration, and endpoint settings. Do not restore it. An attacker with enough access to deploy ransomware had enough access to add logins, create linked servers pointing back to compromised systems, or alter server configuration for persistence. Rebuilding from script eliminates everything they touched.

msdb — Contains SQL Server Agent Jobs, backup history, maintenance plans, and SSIS packages. Do not restore it. SQL Agent Jobs are a well-documented attacker persistence mechanism. A malicious job carried back through a restore can re-execute a payload the moment the SQL Server Agent service starts. The restore completes cleanly. The engine reports success. The job fires an hour later. Rebuild msdb from scratch and recreate only the jobs you can verify are clean — pulled from scripts in immutable storage. Your backup history will be gone. That is acceptable. Your Agent Jobs will be clean, which is not optional.

model — The template SQL Server uses for every new database created on the instance. If an attacker altered model — adding objects, changing settings, inserting execution code — every new database inherits the contamination automatically. In the IRE, start with the default model that came with the fresh SQL Server installation. Leave it alone unless you have verified, trustworthy scripts stored in immutable storage that you know reflect intentional, approved changes made before the attack. If you cannot confirm those scripts are clean and accurate, the default is safer than any customization you cannot fully trust.

All three of these rebuilds depend on scripts maintained in immutable storage before the attack. If those scripts do not exist or have not been kept current, the rebuild becomes reconstruction from memory, under pressure, in the middle of the night. That is avoidable. The time to build and maintain those scripts is now. One critical rule on which scripts to use: if the user database restore process forced you back in time — if the earliest clean backup you could restore was from a week ago, or two weeks ago — do not use any scripts generated after that date. Any script created or modified after the attacker’s window opened is potentially compromised. Only scripts that predate your clean restore point can be trusted. This is exactly why user databases are restored first. You must establish that clean date before you know which version of your scripts is safe to use for master, msdb, and model.

Step 5: Run This Without Pulling Sysadmins Off the Incident

The IRE build and database validation runs as a parallel workstream. Your systems administrators are managing the active incident — containment, forensics, communications, vendor coordination. The database recovery team works in the IRE independently. These are not the same people doing the same job at the same time. If you do not plan for parallel tracks before the incident, you will be forced to choose between them during it.

The 75% who get hit again had backups. They had restore procedures. What they did not have was a tested, validated recovery workflow built and proven before the attack. That is the difference between a recovery and a second incident.

Data Systems Architecture works with SQL Server teams to build every piece of this before it is needed — the auditing and monitoring configuration that detects the attack early, the alerting that puts the right information in front of the right people in time to act, the immutable storage strategy, the IRE runbook, the pre-incident scripting, the tabletop with your full team at the table, and the quarterly live simulations that prove the workflow still holds as your environment changes. The earlier you detect, the less ground you have to recover. The choice is stark: discover at 2am that everything you thought was your safety net is gone — or face that same call knowing you have been here before, your team knows the workflow, and you are ready. That difference comes down entirely to preparation. Reach out to us today. We want to help you build that preparation before the night it matters.

If you have not run a tabletop, scheduled a quarterly drill, or tested a full IRE restore — that gap is worth a conversation. Request your Discovery Call today.