Certain DBs in an AG not Backing Up

Last week, I received an alert that the percentage of transaction log in use on one of our production databases was increasing more than it should have been. I’ll refer to this database as DB1 from here on in. It had reached 18%, which is above the normal 5-10% we like to see it at. DB1 is in an Always On Availability Group so if the log usage jumps we jump.

The first thing I checked was that the log backups were running. We use Ola Hallengren’s maintenance solution to manage our backups. The backup jobs were running without error.

The next thing to check was the log_reuse_wait_desc column in sys.databases.

Checking the log reuse wait description

The highlighted row is the row for DB1. A log_reuse_wait of 2 means there were no changes to backup since the last log backup. If that were the case then why was my transaction log slowly filling like an unattended bath?

To check everything looked OK with the transaction log I ran DBCC LOGINFO(DB1). Below you can see as much of the output as I was able to fit into a single screenshot.

Checking the VLFs look ok for the transaction log of a database in an Availability Group

The VLFs look good. They are still in their pre defined manually grown out size. Nothing looks out of place here.

So when something weird is happening it’s time to check the SQL Server logs.

SQL Server log showing an outage of the availability group

Not the best screenshot I know but what this shows is that at 12:37:27 on the 16th we had some kind of blip with our Availability Group (highlighted in red). This looks like a communications blip as this AG is spread across two physical sites. The log record highlighted in blue is the last occurance of a log backup for DB1 in the log. The blue line above the AG blip messages is where the next log backup for DB1 should have been. So for some reason DB1 was no longer being backed up by the job that had been happily doing it for the last few months. The weirdest thing was that some of the DBs in that AG were still being backed up.

By default these jobs follow the backup setting for the AG. This setting on our 2 node AG was to perform automated backups on secondaries only. Below is the output file of the log backup job from the secondary node.

Ola Hallengren backups solution log

DB1 was showing as NOT being on the preferred backup replica while another DB in the same AG was showing the opposite. This is very strange as the preferred replica is set at the AG level and these databases are in the same AG.

I then opened the stored procedure dbo.DatabaseBackup (Part of Ola’s solution) used to generate the backups. I did a quick search for “Is preferred backup replica”.

SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)

The code I found was using the result of a system function that returns a 1 if the database is on it’s preferred replica and a 0 if it’s not. I tested this by calling the function directly.

SELECT sys.fn_hadr_backup_is_preferred_replica('DB_That_Was_Backing_Up');

SELECT sys.fn_hadr_backup_is_preferred_replica('DB1');

sys.fn_hadr_backup_is_preferred_replica incorrect results

When the logs and testing just bring more weirdness your way it’s time to consult:

After boring my colleagues for ten minutes with my description of what I was seeing, I did some research and quickly came across this connect item which talks about situations in which the function in the code above always returns 0. This sounded like a nasty problem without many workarounds.

To me the problem sounded like some meta data was corrupted by the blip experienced by the AG. As an experiment, I altered the AG’s backup setting to Primary Only and then switched it back in an effort to rebuild the meta data. Below is the result of the same query after the ‘change’.

sys.fn_hadr_backup_is_preferred_replica now returns the correct results

I ran the log backup job and suddenly all databases were included. Problem solved!

Well maybe not. I’m not saying that my theory was right (I don’t think I’ll ever know what really happened) but it did cause me to improve our monitoring to watch for databases not logging successful log backups. Relying on the job outcome is not enough. Transaction log usage of databases is a crucial piece of monitoring for DBAs and this is not the first time it’s saved the day.

About the author

James Anderson

I'm a freelance SQL Server DBA in Southampton, UK. Performance tuning code and improving development processes are what I love to do for work. Outside of work, I live to travel to as many places as possible with my wife Sarah.

4 thoughts on “Certain DBs in an AG not Backing Up”

  1. I have hit issues with that before too, the issue I had was due to the way that VMs had been built from clones and renamed. When the result of @@servername does not match the actual SQL Server name the function will always return 0.

    1. @@servername is returning the cluster name\instancename which looks correct. Wonder if having a cross site AG on FCIs is the cause. I have ‘fixed’ the issue by forcing Olas solution to ignore that function. This did mean I had to make the agent jobs AG aware.

  2. ********This did mean I had to make the agent jobs AG aware.

    Please will be great if you elaborate more on this.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign Up

James Anderson SQL Server DBA

Hi, I'm James Anderson. When I'm not blogging about SQL Server, I do this.

The SQL People

DBA Stackexchange Profile