***UPDATE*** Script has been modified so that you don’t have to know the default location of your error log.
Within the first two weeks of coming to my new employer, I was tasked with developing a method of tracking and monitoring the which of the replicas in any given Availability Group was the primary and send a notification when the primary changed.
In order to quickly and simply detect a change, I connected to the Availability Group and query the sys.dm_hadr_availability_group_states DMV. If the primary replica in the DMV is the same as what’s in our CMDB, then we do nothing. If it’s different, make the requisite updates and send a page to the team.
What it doesn’t tell us is when the actual event occurred. We run this process every 15 minutes so that’s as granular as we could get if that was the only tool we used. However, SQL Server 2012 also includes an Extended Event Session called “AlwaysOn_health” that uses an event file target that we can query. We can leverage this session to figure out when a replica state change event occurred.
First, we need to determine the file name to which the Event Session is writing. Instead of searching every drive for a pattern like ‘AlwaysOn*_.xel’, we can use the sys.dm_xe_sessions and sys.dm_xe_session_targets DMVs.
With our file name in hand, we can query the event file:
DECLARE @FileName NVARCHAR(4000)
SELECT @FileName =
CAST(target_data AS XML) target_data
FROM sys.dm_xe_sessions s
JOIN sys.dm_xe_session_targets t
ON s.address = t.event_session_address
WHERE s.name = N'AlwaysOn_health'
XEData.value('(event/@timestamp)','datetime2(3)') AS event_timestamp,
XEData.value('(event/data/text)', 'VARCHAR(255)') AS previous_state,
XEData.value('(event/data/text)', 'VARCHAR(255)') AS current_state,
SELECT CAST(event_data AS XML) XEData, *
FROM sys.fn_xe_file_target_read_file(@FileName, NULL, NULL, NULL)
WHERE object_name = 'availability_replica_state_change'
JOIN sys.availability_replicas ar
ON ar.replica_id = XEData.value('(event/data/value)', 'VARCHAR(255)')
ORDER BY event_timestamp DESC;
If that query is executed against the Availability Group (by logging into the AG and not a node), you’ll get the events that have occurred on the node that is currently serving as the primary replica.
So starting at 5:10:20.794 PM, on August 2nd, 2013, this node (which is now primary. Remember, we know this because we’re logged in to the Availability Group which is running off of the primary replica) began its transition from secondary to primary replica. We see the course of action that it took which led to it ultimately settling in as PRIMARY_NORMAL its current state at 05:10:33.106 PM.
We could run the same query on any of the other replicas and see their histories as well, particularly the replica that transitioned from primary to secondary.