Instant File Initialization and Database Growths

I just came across a very interesting discovery: instant file initialization is not enabled in our environment. In order to champion its merits to the rest of the team, I needed to set up a way to demonstrate how it could benefit us.

I started with the rather simplistic method of turning on time statistics, creating a database, resizing the file a few times, compare the Total Elapsed Time from runs without and with IFI enabled and call it good. That netted a pretty basic, but helpful source of comparison that indicated that instant file initialization could be of some benefit to us.

But then I decided that since I’m in the middle of my fascination with Extended Events that I’d check to see what I can do with those. This turned out to be pretty simple. Using Jonathan Kehayias’ event metadata query as my starting point, I decided to see what I could use.

SELECT p.name AS package_name,
       o.name AS event_name,
       o.description
FROM sys.dm_xe_packages AS p
JOIN sys.dm_xe_objects AS o 
     ON p.guid = o.package_guid
WHERE (p.capabilities IS NULL OR p.capabilities & 1 = 0)
  AND (o.capabilities IS NULL OR o.capabilities & 1 = 0)
  AND o.object_type = 'event'
ORDER BY p.name, o.name;

That’s where I found this event.
event
There was also this event
depEvent
but it was discontinued.

Create the Event Session

CREATE EVENT SESSION TrackGrowths ON SERVER
ADD EVENT sqlserver.database_file_size_change(
    ACTION(sqlserver.database_id,sqlserver.sql_text)),
ADD EVENT sqlserver.databases_data_file_size_changed(
    ACTION(sqlserver.database_id,sqlserver.sql_text))
ADD TARGET package0.event_file (
	SET FILENAME = 'c:\traces\growths.xel'
)
WITH (TRACK_CAUSALITY=ON);

Then I created my test database

CREATE DATABASE GrowthTest
GO
USE GrowthTest

Just to confirm the size I did a quick-and-dirty

sp_spaceused

space_used_pre

So we’re at the basic, default model template size for this database. Next we need to turn the Event Session on.

ALTER EVENT SESSION TrackGrowths ON SERVER
	STATE=START

Now it’s time to expand our database. I decided to create process that would expand the database with a query instead of ALTER DATABASE statements. I decided to use the script to generate a Numbers table in Kendra Little’s partitioning post and push the table to 1 million rows and then stop the session.

;WITH    Pass0 AS ( SELECT   1 AS C UNION ALL SELECT   1), 
	Pass1 AS ( SELECT   1 AS C FROM     Pass0 AS A , Pass0 AS B),
	Pass2 AS ( SELECT   1 AS C FROM     Pass1 AS A , Pass1 AS B),
	Pass3 AS ( SELECT   1 AS C FROM     Pass2 AS A , Pass2 AS B),
	Pass4 AS ( SELECT   1 AS C FROM     Pass3 AS A , Pass3 AS B),
	Pass5 AS ( SELECT   1 AS C FROM     Pass4 AS A , Pass4 AS B),
	NumGen AS ( SELECT   row_number() OVER ( Order BY C ) AS Num FROM Pass5 )
SELECT  Num
INTO    Numbers
FROM    NumGen
WHERE   Num <= 1000000;
GO 
ALTER EVENT SESSION TrackGrowths ON SERVER
	STATE=STOP;
GO

Then just a quick sanity check with sp_spaceused again to ensure that my database actually grew
space_used_post
and then the ever-present XML shredding Extended Event file processing query to dump the results into a table in an analysis database.

SELECT event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
    event_data.value('(event/@package)[1]', 'varchar(50)') AS package_name,
    DATEADD(hh, 
            DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
            event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
	event_data.value('(event/data[@name="database_id"])[1]', 'int') AS database_id,
	event_data.value('(event/data[@name="file_id"])[1]', 'int') AS file_id,
	event_data.value('(event/data[@name="file_type"]/text)[1]', 'varchar(25)') AS file_type,
	event_data.value('(event/data[@name="is_automatic"])[1]', 'varchar(10)') AS is_automatic,
	event_data.value('(event/data[@name="total_size_kb"])[1]', 'int') AS total_size_kb,
	event_data.value('(event/data[@name="size_change_kb"])[1]', 'int') AS size_change_kb,
	event_data.value('(event/data[@name="duration"])[1]', 'int') AS duration,
	event_data.value('(event/data[@name="file_name"])[1]', 'sysname') AS file_name,    
	event_data.value('(event/action[@name="sql_text"])[1]', 'nvarchar(4000)') AS sql_text,
	CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
                CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence,
                CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id_xfer"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id_xfer
	INTO analysis.dbo.growth_xe
	FROM (
			SELECT
				CAST(event_data AS XML) AS event_data
			FROM sys.fn_xe_file_target_read_file('c:\traces\growths*.xel', NULL, NULL, NULL)
		) ed

With that data in hand, I enabled Instant File Initiliazation and stepped through the process again.

My first look at the data was something like this

SELECT 
	ROW_NUMBER() OVER (ORDER BY timestamp) AS growth_number,
	event_name, 
	timestamp, 
	file_type,
	total_size_kb, 
	size_change_kb, 
	duration
	FROM analysis.dbo.growth_xe_ifi
	WHERE file_type = 'Data file'
	ORDER BY [timestamp];
SELECT 
	ROW_NUMBER() OVER (ORDER BY timestamp) AS growth_number,
	event_name, 
	timestamp, 
	file_type,
	total_size_kb, 
	size_change_kb, 
	duration
	FROM analysis.dbo.growth_xe
	WHERE file_type = 'Data file'
	ORDER BY [timestamp];

and when charted as a function of “growth_number”, the raw results (ignoring the major outlier at growth 10 for IFI Disabled) look surprisingly similar.
FileGrowthChart
To get a sense of the distribution of data points, I ran the following queries:

SELECT v.duration, COUNT(*) occurrences
FROM analysis.dbo.growth_xe_ifi v
JOIN (
	SELECT DISTINCT duration AS dval
		FROM analysis.dbo.growth_xe_ifi
		WHERE file_type = 'Data file'
	) d
ON v.duration = d.dval 
WHERE file_type = 'Data file'
GROUP BY v.duration
ORDER BY v.duration;

SELECT v.duration nonifi_duration, COUNT(*) occurrences
FROM analysis.dbo.growth_xe v
JOIN (
	SELECT DISTINCT duration AS dval
		FROM analysis.dbo.growth_xe
		WHERE file_type = 'Data file'
	) d
ON v.duration = d.dval 
WHERE file_type = 'Data file'
GROUP BY v.duration
ORDER BY v.duration;

with the resultant histograms as below.
IFIhist
nonIFIhist
Finally (for this portion) I decided to see if I could find a correlation between the total_file_size_kb data point (which I selected as my independent variable) and the duration data point (the dependent variable). To test this I cobbled up the following query (results are comments).

SELECT 
	SUM([x-meanx]*[y-meany]) / ((n-1)*stdev_prod)
	FROM (
SELECT 
	ROW_NUMBER() OVER (ORDER BY timestamp) AS growth_number,
	event_name, 
	timestamp, 
	file_type,
	total_size_kb, 
	size_change_kb, 
	duration,
	COUNT(*) OVER (PARTITION BY file_type) AS n,
	AVG(duration*1.) OVER (PARTITION BY file_type) AS avg_duration,
	duration - AVG(duration*1.) OVER (PARTITION BY file_type) AS [y-meany],
	total_size_kb - AVG(total_size_kb*1.) OVER (PARTITION BY file_type) AS [x-meanx],
	AVG(total_size_kb*1.) OVER (PARTITION BY file_type) AS avg_total_size_kb,
	STDEVP(duration) OVER (PARTITION BY file_type) AS stdev_duration, 
	STDEVP(total_size_kb) OVER (PARTITION BY file_type) AS stdev_total_size_kb,
	STDEVP(duration) OVER (PARTITION BY file_type) * STDEVP(total_size_kb) OVER (PARTITION BY file_type)  as stdev_prod,
	((duration * 1.) - AVG(duration) OVER (PARTITION BY file_type)) / (STDEVP(duration) OVER (PARTITION BY file_type)) AS [z-score]
FROM analysis.dbo.growth_xe
	WHERE file_type = 'Data file'
) basestats
GROUP BY n, stdev_prod;
--0.453139841835377
SELECT 
	SUM([x-meanx]*[y-meany]) / ((n-1)*stdev_prod)
	FROM (
SELECT 
	ROW_NUMBER() OVER (ORDER BY timestamp) AS growth_number,
	event_name, 
	timestamp, 
	file_type,
	total_size_kb, 
	size_change_kb, 
	duration,
	COUNT(*) OVER (PARTITION BY file_type) AS n,
	AVG(duration*1.) OVER (PARTITION BY file_type) AS avg_duration,
	duration - AVG(duration*1.) OVER (PARTITION BY file_type) AS [y-meany],
	total_size_kb - AVG(total_size_kb*1.) OVER (PARTITION BY file_type) AS [x-meanx],
	AVG(total_size_kb*1.) OVER (PARTITION BY file_type) AS avg_total_size_kb,
	STDEVP(duration) OVER (PARTITION BY file_type) AS stdev_duration, 
	STDEVP(total_size_kb) OVER (PARTITION BY file_type) AS stdev_total_size_kb,
	STDEVP(duration) OVER (PARTITION BY file_type) * STDEVP(total_size_kb) OVER (PARTITION BY file_type)  as stdev_prod,
	((duration * 1.) - AVG(duration) OVER (PARTITION BY file_type)) / (STDEVP(duration) OVER (PARTITION BY file_type)) AS [z-score]
FROM analysis.dbo.growth_xe_ifi
	WHERE file_type = 'Data file'
) basestats
GROUP BY n, stdev_prod;
--0.910417519447016

Interestingly, with Instant File Initialization disabled, there was negligible correlation (r~=0.453139841835377) whereas with it enabled there was fairly strong correlation between the file size and the duration of the growth (r ~= 0.910417519447016). This doesn’t mean that file size causes longer growth times but that in a way, the two data points are correlated in some way, shape or form.

Thinking about it, this makes sense. When Instant File Initialization is not enabled there are many other factors at play, not the least of which is the actual performance of the underlying storage. When it is enabled, storage performance is minimized through in the process of expanding a data file. I plan to look at other database settings to see how correlated growth durations are with these settings.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s