Last week, I received an alert that the percentage of transaction log in use on one of our production databases was increasing more than it should have been. I’ll refer to this database as DB1 from here on in. It had reached 18%, which is above the normal 5-10% we like to see it at. DB1 is in an Always On Availability Group so if the log usage jumps we jump.
The first thing I checked was that the log backups were running. We use Ola Hallengren’s maintenance solution to manage our backups. The backup jobs were running without error.
The next thing to check was the log_reuse_wait_desc column in sys.databases.
The highlighted row is the row for DB1. A log_reuse_wait of 2 means there were no changes to backup since the last log backup. If that were the case then why was my transaction log slowly filling like an unattended bath?
To check everything looked OK with the transaction log I ran DBCC LOGINFO(DB1). Below you can see as much of the output as I was able to fit into a single screenshot.
The VLFs look good. They are still in their pre defined manually grown out size. Nothing looks out of place here.
So when something weird is happening it’s time to check the SQL Server logs.
Not the best screenshot I know but what this shows is that at 12:37:27 on the 16th we had some kind of blip with our Availability Group (highlighted in red). This looks like a communications blip as this AG is spread across two physical sites. The log record highlighted in blue is the last occurance of a log backup for DB1 in the log. The blue line above the AG blip messages is where the next log backup for DB1 should have been. So for some reason DB1 was no longer being backed up by the job that had been happily doing it for the last few months. The weirdest thing was that some of the DBs in that AG were still being backed up.
By default these jobs follow the backup setting for the AG. This setting on our 2 node AG was to perform automated backups on secondaries only. Below is the output file of the log backup job from the secondary node.
DB1 was showing as NOT being on the preferred backup replica while another DB in the same AG was showing the opposite. This is very strange as the preferred replica is set at the AG level and these databases are in the same AG.
I then opened the stored procedure dbo.DatabaseBackup (Part of Ola’s solution) used to generate the backups. I did a quick search for “Is preferred backup replica”.
SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)
The code I found was using the result of a system function that returns a 1 if the database is on it’s preferred replica and a 0 if it’s not. I tested this by calling the function directly.
SELECT sys.fn_hadr_backup_is_preferred_replica('DB_That_Was_Backing_Up'); GO SELECT sys.fn_hadr_backup_is_preferred_replica('DB1'); GO
When the logs and testing just bring more weirdness your way it’s time to consult:
- Someone else in your team
- Your favorite search engine
- A Q/A site like dba.stackexchange.com
After boring my colleagues for ten minutes with my description of what I was seeing, I did some research and quickly came across this connect item which talks about situations in which the function in the code above always returns 0. This sounded like a nasty problem without many workarounds.
To me the problem sounded like some meta data was corrupted by the blip experienced by the AG. As an experiment, I altered the AG’s backup setting to Primary Only and then switched it back in an effort to rebuild the meta data. Below is the result of the same query after the ‘change’.
I ran the log backup job and suddenly all databases were included. Problem solved!
Well maybe not. I’m not saying that my theory was right (I don’t think I’ll ever know what really happened) but it did cause me to improve our monitoring to watch for databases not logging successful log backups. Relying on the job outcome is not enough. Transaction log usage of databases is a crucial piece of monitoring for DBAs and this is not the first time it’s saved the day.