Just Drop It

I decided to try the free SQL Azure trial so that I could be conversant in the environment. I migrated an old personal project to the platform so that I had an actual data set. I chose the 1GB Web database as my testing environment. My understanding is that the current differentiation between Web and Business is database sizing. However, this doesn’t eliminate the probability of some future feature differentiation.

Once I set up the basics, it was time to load the data. I encountered an interesting issue.  Running some variation of the following code:

Continue reading

When did my Availability Group change primary replica states?

***UPDATE*** Script has been modified so that you don’t have to know the default location of your error log.
Within the first two weeks of coming to my new employer, I was tasked with developing a method of tracking and monitoring the which of the replicas in any given Availability Group was the primary and send a notification when the primary changed.

In order to quickly and simply detect a change, I connected to the Availability Group and query the sys.dm_hadr_availability_group_states DMV. If the primary replica in the DMV is the same as what’s in our CMDB, then we do nothing. If it’s different, make the requisite updates and send a page to the team.

What it doesn’t tell us is when the actual event occurred. We run this process every 15 minutes so that’s as granular as we could get if that was the only tool we used. However, SQL Server 2012 also includes an Extended Event Session called “AlwaysOn_health” that uses an event file target that we can query. We can leverage this session to figure out when a replica state change event occurred.

First, we need to determine the file name to which the Event Session is writing. Instead of searching every drive for a pattern like ‘AlwaysOn*_.xel’, we can use the sys.dm_xe_sessions and sys.dm_xe_session_targets DMVs.

With our file name in hand, we can query the event file:

    DECLARE @FileName NVARCHAR(4000)
    SELECT @FileName =
				CAST(target_data AS XML) target_data
				FROM sys.dm_xe_sessions s
				JOIN sys.dm_xe_session_targets t
					ON s.address = t.event_session_address
				WHERE s.name = N'AlwaysOn_health'
		) ft

		XEData.value('(event/@timestamp)[1]','datetime2(3)') AS event_timestamp,
		XEData.value('(event/data/text)[1]', 'VARCHAR(255)') AS previous_state,
		XEData.value('(event/data/text)[2]', 'VARCHAR(255)') AS current_state,
		SELECT CAST(event_data AS XML) XEData, *
			FROM sys.fn_xe_file_target_read_file(@FileName, NULL, NULL, NULL)
			WHERE object_name = 'availability_replica_state_change'
		 ) event_data
	JOIN sys.availability_replicas ar
		ON ar.replica_id = XEData.value('(event/data/value)[5]', 'VARCHAR(255)')
	ORDER BY event_timestamp DESC;

If that query is executed against the Availability Group (by logging into the AG and not a node), you’ll get the events that have occurred on the node that is currently serving as the primary replica.

event_timestamp previous_state current_state replica_server_name
2013-08-02 17:10:33.106 PRIMARY_PENDING PRIMARY_NORMAL ag-server-node
2013-08-02 17:10:32.857 RESOLVING_NORMAL PRIMARY_PENDING ag-server-node
2013-08-02 17:10:32.705 RESOLVING_PENDING_FAILOVER RESOLVING_NORMAL ag-server-node
2013-08-02 17:10:20.794 SECONDARY_NORMAL RESOLVING_PENDING_FAILOVER ag-server-node

So starting at 5:10:20.794 PM, on August 2nd, 2013, this node (which is now primary. Remember, we know this because we’re logged in to the Availability Group which is running off of the primary replica) began its transition from secondary to primary replica. We see the course of action that it took which led to it ultimately settling in as PRIMARY_NORMAL its current state at 05:10:33.106 PM.

We could run the same query on any of the other replicas and see their histories as well, particularly the replica that transitioned from primary to secondary.