By Denise Crabtree
If you have never seen it before, those words invoke a cold chill sinking into your chest. You know it could be bad. Until it is resolved, that database is unusable.
The event log number is 9001. It will appear in the Application Event Log and SQL Server Error Log.
It Happened to Me:
The first two times I experienced this error happened two weeks apart for extremely different reasons. Hyper-V was the source of the first one, a VM went down hard. When the System Administrator got the machine up the first time; the log drive for the majority of databases was gone. He had the option to attach that existing LUN presented to the server, but it asked to format the drive. The second option was to attempt a second reboot. What a relief, it came back with the files on reboot — and the active databases came back as suspect. Not so much of a relief.
The second time was self-inflicted. I was returning disk allocations after uninstalling a named instance of SQL. I identified the active log drive for removal. This time I knew I could have the database back online quickly by rebuilding the log files. Self-recriminations had to wait; getting databases on-line was the priority.
Recently, I had the third opportunity to recover from a 9001 error on a database which lost the connection to the Log file after a freeze/thaw snapshot backup. All other databases on the server were OK.
Sadly, gladly? I can rock the repair, now.
Actions:
Delegate the outage notification. Get a System Administrator involved. This is not the time to be a lone cowboy. The SA will need to perform Root Cause Analysis or maybe even some remediation. You may need a SA to present a new drive if the original drive is physically corrupt.
First, if the original drive is available with the original logs visible in Explorer. Take the database OFFLINE then ONLINE. It may pick up the file and be OK. Check with DBCC! The LSN setting must match in the mdf and ldf — if not it will go, SUSPECT.
Next — rebuild the log file with the Remediation SQL script below.
Always run DBCC before declaring the all-clear. Schrodinger’s cat says that it may be bad; until it is not.
Prevention:
Basically, prevention is more mitigation of external factors.
If a server is running on an internal raid disk set, those disks should be periodically checked for sector failures. Potential disk failures ought to be part of the disk management and should be handled proactively.
If a server is running on a virtual layer, VMDK errors where the underlying, thin provisioned drive is running low on free space, proactive management of VMDK alerts is vital to database health.
To prevent self-inflicted wounds; double-double check drive LUN identification!
Monitoring:
It is far better to have monitoring to catch errors before users are calling the support team. For this reason, the Event/Error code 9001 should be set up as a Monitoring System alert. The alert should go to a 24/7 op center to begin immediate troubleshooting and escalation. This is an actionable and high priority. The sooner this can be acted on means less downtime for the database(s).
In lieu of the specific 9001 monitoring alert, a general alert for any level 21 error can also be enabled in the SQL Agent. It is better to know there is a problem and maybe a little noise when this level of error occurs. Use the “Delay between Responses” setting to minimize noise. A drawback to SQL alerts, there is not a recovery message for a ticketing or monitoring system to close any opened tickets.
Testing and Practice Procedure:
Just think that if you have practiced the recovery — that sinking feeling in the chest will not be quite as bad. You too will be able to rock the repairs.
On a desktop instance SQL Express:
- Add a thumb drive to the machine.
- Create a database with a log file on the thumb drive.
- Start a query that inserts a row every second. Let that run through next step
- Eject thumb drive
- Your connection will probably drop immediately
- The database may be SUSPECT — or not be visible — or be in Recovery
- Error 9001 will show up in Event Log and SQL Error Log
- SQL Server error Level 21 in SQL Error Log
- Use Remediation Steps to recover the database
Having familiarity with the type of messages this creates and how to run the script will save time if this is ever needed in your environment.
Remediation:
Run each statement separately, by highlighting and executing highlighted command in Query window with changes needed for a specific database.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— When a database is SUSPECT or following error 9001 in the event log:
— — Msg 9001, Level 21, State 1, Line 1
— — The log for database ‘WoundedDatabase01′ is not available.
— — Check the event log for related error messages.
— — Resolve any errors and restart the database.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— First step: Set the database to EMERGENCY Status
— — — — — — — — — — — — — — — — — — — — — — — — — — –
USE master
GO
ALTER DATABASE [WoundedDatabase01]
SET EMERGENCY
WITH ROLLBACK IMMEDIATE
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Run Checkdb — If the data file is corrupt, repair/restore is needed
— — — — — — — — — — — — — — — — — — — — — — — — — — –
DBCC CHECKDB (WoundedDatabase01)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— No errors from CHECKDB = good news. This can be recovered with a log rebuild
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— How to rebuild the SQL Database Log File?
— First, Know where the current one is:
— The goal is to create a new log file.
— Has to have a different file name (unique)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— sp_helpdb tells you the file names.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
Exec master..sp_helpdb [woundeddatabase01]
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Logical name = WoundedDatabase01_log
— Current physical file name = E:\MSSQL\WoundedDatabase01.ldf
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
REBUILD LOG ON
(
NAME=”WoundedDatabase01_log”
,FILENAME=’E:\MSSQL\WoundedDatabase01_log.ldf’
)
/* OUTPUT: Warning: The log for database ‘WoundedDatabase01’ has been rebuilt.
Transactional consistency has been lost.
The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were.
You should run DBCC CHECKDB to validate physical consistency.
The database has been put in dbo-only mode.
When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
*/
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Integrity Check — Run CHECKDB again
— — — — — — — — — — — — — — — — — — — — — — — — — — –
DBCC CHECKDB (WoundedDatabase01)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— No errors from CHECKDB = good news. Get DB online and useable
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
SET ONLINE
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— EMERGENCY is also DBO only — so need to set so users can connect
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
SET MULTI_USER
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Final Steps:
— Delete the original log file, from OS, that is no longer in use.
— Run Full Backup (See /* OUTPUT … */ above (backup chain broken)
— Perform Root Cause Analysis
— — — — — — — — — — — — — — — — — — — — — — — — — — –
If you have never seen it before, those words invoke a cold chill sinking into your chest. You know it could be bad. Until it is resolved, that database is unusable.
The event log number is 9001. It will appear in the Application Event Log and SQL Server Error Log.
It Happened to Me:
The first two times I experienced this error happened two weeks apart for extremely different reasons. Hyper-V was the source of the first one, a VM went down hard. When the System Administrator got the machine up the first time; the log drive for the majority of databases was gone. He had the option to attach that existing LUN presented to the server, but it asked to format the drive. The second option was to attempt a second reboot. What a relief, it came back with the files on reboot — and the active databases came back as suspect. Not so much of a relief.
The second time was self-inflicted. I was returning disk allocations after uninstalling a named instance of SQL. I identified the active log drive for removal. This time I knew I could have the database back online quickly by rebuilding the log files. Self-recriminations had to wait; getting databases on-line was the priority.
Recently, I had the third opportunity to recover from a 9001 error on a database which lost the connection to the Log file after a freeze/thaw snapshot backup. All other databases on the server were OK.
Sadly, gladly? I can rock the repair, now.
Actions:
Delegate the outage notification. Get a System Administrator involved. This is not the time to be a lone cowboy. The SA will need to perform Root Cause Analysis or maybe even some remediation. You may need a SA to present a new drive if the original drive is physically corrupt.
First, if the original drive is available with the original logs visible in Explorer. Take the database OFFLINE then ONLINE. It may pick up the file and be OK. Check with DBCC! The LSN setting must match in the mdf and ldf — if not it will go, SUSPECT.
Next — rebuild the log file with the Remediation SQL script below.
Always run DBCC before declaring the all-clear. Schrodinger’s cat says that it may be bad; until it is not.
Prevention:
Basically, prevention is more mitigation of external factors.
If a server is running on an internal raid disk set, those disks should be periodically checked for sector failures. Potential disk failures ought to be part of the disk management and should be handled proactively.
If a server is running on a virtual layer, VMDK errors where the underlying, thin provisioned drive is running low on free space, proactive management of VMDK alerts is vital to database health.
To prevent self-inflicted wounds; double-double check drive LUN identification!
Monitoring:
It is far better to have monitoring to catch errors before users are calling the support team. For this reason, the Event/Error code 9001 should be set up as a Monitoring System alert. The alert should go to a 24/7 op center to begin immediate troubleshooting and escalation. This is an actionable and high priority. The sooner this can be acted on means less downtime for the database(s).
In lieu of the specific 9001 monitoring alert, a general alert for any level 21 error can also be enabled in the SQL Agent. It is better to know there is a problem and maybe a little noise when this level of error occurs. Use the “Delay between Responses” setting to minimize noise. A drawback to SQL alerts, there is not a recovery message for a ticketing or monitoring system to close any opened tickets.
Testing and Practice Procedure:
Just think that if you have practiced the recovery — that sinking feeling in the chest will not be quite as bad. You too will be able to rock the repairs.
On a desktop instance SQL Express:
- Add a thumb drive to the machine.
- Create a database with a log file on the thumb drive.
- Start a query that inserts a row every second. Let that run through next step
- Eject thumb drive
- Your connection will probably drop immediately
- The database may be SUSPECT — or not be visible — or be in Recovery
- Error 9001 will show up in Event Log and SQL Error Log
- SQL Server error Level 21 in SQL Error Log
- Use Remediation Steps to recover the database
Having familiarity with the type of messages this creates and how to run the script will save time if this is ever needed in your environment.
Remediation:
Run each statement separately, by highlighting and executing highlighted command in Query window with changes needed for a specific database.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— When a database is SUSPECT or following error 9001 in the event log:
— — Msg 9001, Level 21, State 1, Line 1
— — The log for database ‘WoundedDatabase01′ is not available.
— — Check the event log for related error messages.
— — Resolve any errors and restart the database.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— First step: Set the database to EMERGENCY Status
— — — — — — — — — — — — — — — — — — — — — — — — — — –
USE master
GO
ALTER DATABASE [WoundedDatabase01]
SET EMERGENCY
WITH ROLLBACK IMMEDIATE
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Run Checkdb — If the data file is corrupt, repair/restore is needed
— — — — — — — — — — — — — — — — — — — — — — — — — — –
DBCC CHECKDB (WoundedDatabase01)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— No errors from CHECKDB = good news. This can be recovered with a log rebuild
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— How to rebuild the SQL Database Log File?
— First, Know where the current one is:
— The goal is to create a new log file.
— Has to have a different file name (unique)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— sp_helpdb tells you the file names.
— — — — — — — — — — — — — — — — — — — — — — — — — — –
Exec master..sp_helpdb [woundeddatabase01]
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Logical name = WoundedDatabase01_log
— Current physical file name = E:\MSSQL\WoundedDatabase01.ldf
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
REBUILD LOG ON
(
NAME=”WoundedDatabase01_log”
,FILENAME=’E:\MSSQL\WoundedDatabase01_log.ldf’
)
/* OUTPUT: Warning: The log for database ‘WoundedDatabase01’ has been rebuilt.
Transactional consistency has been lost.
The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were.
You should run DBCC CHECKDB to validate physical consistency.
The database has been put in dbo-only mode.
When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
*/
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Integrity Check — Run CHECKDB again
— — — — — — — — — — — — — — — — — — — — — — — — — — –
DBCC CHECKDB (WoundedDatabase01)
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— No errors from CHECKDB = good news. Get DB online and useable
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
SET ONLINE
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— EMERGENCY is also DBO only — so need to set so users can connect
— — — — — — — — — — — — — — — — — — — — — — — — — — –
ALTER DATABASE [WoundedDatabase01]
SET MULTI_USER
— — — — — — — — — — — — — — — — — — — — — — — — — — –
— Final Steps:
— Delete the original log file, from OS, that is no longer in use.
— Run Full Backup (See /* OUTPUT … */ above (backup chain broken)
— Perform Root Cause Analysis
— — — — — — — — — — — — — — — — — — — — — — — — — — –