Index Maintenance Operations: The Beginning

Posted on January 4, 2017 by swasheck

Recently, our index maintenance process was called into question on an Availability Group because it began running significantly longer than expected. The initial theory put forward by other groups was that this was due the Availability Group which is a 2+1 cluster (2 HA synchronous replicas, 1 asynchronous DR replica) and there was a commit delay. Our monitoring tool SentryONE didn’t alert us to any significant delays, but we committed to investigate the theory since it would be a good exercise for all involved. However, our priority was to actually investigate the objects and indexes on which maintenance was done. Because we log the process’ details, we were able to identify that our process had progressively skewed toward reorganization and away from rebuild operations. Additionally, the larger of the largest of the indexes (by page count and row count) had become 100% reorganization operations.

Since nobody on our team was particularly familiar with the internals of each operation, I did a bit of research and came across Tim Chapman’s 2013 Summit Index Internals Deep Dive presentation. In the slide deck I read a few things that provided some clues into what was occurring during our maintenance process. The first is that reorganization is always single-threaded (slide 29) and that reorganization “[g]enerates a LOT more t-log records than Rebuild.” Given this information, it would seem to make sense as to why our process had suddenly begun taking so much longer.

1. Our larger tables were only being reorganized and not being rebuilt. Since reorganization is single-threaded, moving through high-page count tables on one thread would naturally take a bit longer.
1. If what Tim said is true about t-log records, then there could potentially be a greater chance of log commit delay on the Availability Group, though it hadn’t become problematic enough for us to experience a breach of our alert threshold.

Certainly this is helpful information and served as a good initial response. However, we need to be able to come up with a better explanation, and we need to validate what we found on the Internet. To do this, I created a test scenario in which I’d test the two operations on an index and see how the server responds. The process is as follows:

1. Backup the database
1. Rebuild the candidate index
1. Restore the database
1. Reorganize the candidate index

To monitor these processes I created an Extended Event session with many of the conceivable events that could be raised during the processes (I collected additional actions and filtered based on the database in question).

create event session XEIndexRebuild on server
add event sqlos.wait_info,
add event sqlserver.auto_stats,
add event sqlserver.databases_dbcc_logical_scan,
add event sqlserver.databases_log_flush,
add event sqlserver.degree_of_parallelism,
add event sqlserver.file_write_completed,
add event sqlserver.index_build_extents_allocation,
add event sqlserver.lock_released(set collect_resource_description=(1)),
add event sqlserver.locks_lock_waits,
add event sqlserver.object_altered,
add event sqlserver.oiblob_cleanup_end,
add event sqlserver.physical_page_read,
add event sqlserver.physical_page_write,
add event sqlserver.scan_stopped,
add event sqlserver.sql_batch_completed,
add event sqlserver.sql_batch_starting,
add event sqlserver.transaction_log
add target package0.event_file(set filename=N'XEIndexRebuild')
with (
max_memory=4096 kb,
event_retention_mode=allow_single_event_loss,
max_dispatch_latency=30 seconds,
max_event_size=0 kb,
memory_partition_mode=none,
track_causality=on,
startup_state=off
)
go

tlog usage

The test subject index is a nonclustered index with 37,108 pages and 1,904,887 rows on a SQL Server 2016 instance. Rebuilding this index generated 3.83 MB over 55,826 log records, while Reorganization generated 215.86 MB over 3,750,552 log records. On the same index with the same level of fragmentation, Reorganization generates many more log records (67.18 times more entries and 56.36 more data) to ship to secondaries. However, since this initial test was not done on an Availability Group, it’s not entirely clear what impact the additional data has on the remote harden process with the synchronous replica.

build sequence

A big reason for the log volume disparity is how each of these operations performs its task. Where Rebuild, “drops and re-creates the index”, Reorganize “defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right, order of the leaf nodes” and compacts the pages MSDN. Tracking the transaction log and page reads using our event session, we can see the engine start reading index pages and read sequentially through the pages. It periodically opens a system transaction (seen in the transaction_log event) to write out the pages in their l-t-r sequence, confirming the MSDN documentation.

distinct threads

Finally, the other claim that we need to verify is that Reorganize is a single-threaded operation, while Rebuild can be parallel. It is possible to verify the claim that Reorganize is a single-threaded operation a few different ways. The first way I verified this was by counting the number of distinct system threads throughout the process. Additionally, I used the degree_of_parallelism event, but it was never fired for the Reorganize operation. The Rebuild operation did fire the degree_of_parallelism event with DOP of 4.

We’ve proven Tim Chapman’s presentation to be true and accurate and have confirmed at least one bit of information from the MSDN documentation. However, there are many improvements to the experiment in question. My Adventures in Index Maintenance-Land have begun.

Skewed Data: Analyzing the Windows

Posted on April 19, 2016 by swasheck

Previously, we looked at analyzing a table to see which columns in that table may contain skewed data.

That was a good start, but now it’s time to look at the statistics that exist on that column to see if we can identify potential candidates for filtered statistics, based on the “windows” between histogram steps.

Much of the logic is the same in this script, except it counts every value in the column. Additionally, it will look at all statistics that exist on that column. If there are no statistics on that column, then we can’t do a histogram step window analysis anyway. The same general principles exist for this analysis as well. We’re looking at the test statistic (zG1) to determine how skewed the data may actually be.

Just like with the table analysis, I worked around data type issues by using dense_rank() over the keys. In the histogram dump table, I created a column called alt_key which I then update based on matching the count table key to the histogram step key.

set @sql = 'dbcc show_statistics('''+quotename(@SchemaName)+'.'+quotename(@TableName)+''','''+@StatName+''') with histogram,no_infomsgs;'
        print @sql;     
        insert into tempdb.dbo.histo (
                        range_hi_key, 
                        range_rows , 
                        eq_rows , 
                        distinct_range_rows ,
                        avg_range_rows )
        execute sp_executesql @sql
set @sql = 
        'update h
            set h.actual_eq_rows = c.f,
                h.alt_key = c.x
        from tempdb.dbo.histo h
        join '+(@TallyTable)+' c 
            on h.range_hi_key = c.[key]';

        exec sp_executesql @sql;

What this lets me do is pull the analysis results later, without having to muck with sorting on different data types:

select an.*,confirm_query = 'select * from ' + @TallyTable + ' where ' +
case when cols.last_alt_key is not null then ' where x &gt;= ' + cast(cols.last_alt_key as nvarchar(255)) + ' and ' else
    ' x &lt;=' + cast(cols.alt_key as nvarchar(255))
    end  + ' order by [key]'
from tempdb.dbo.histo an
join (
        select 
            stat_name,
            last_range_hi_key = lag(range_hi_key,1,null) over (partition by stat_name order by range_hi_key), 
            range_hi_key,
            last_alt_key = lag(alt_key,1,null) over (partition by stat_name order by range_hi_key), 
            alt_key
        from tempdb.dbo.histo
    ) cols
on an.stat_name = cols.stat_name
    and an.range_hi_key = cols.range_hi_key
where an.actual_distinct_range_rows &gt;= 100
order by abs(zg1) desc;

With the above analysis available to us, we’d run the text in confirm_query to examine the window (each histogram step is included) to sanity check the analysis and ensure that the window does present with skew and has enough distinct values to make a filtered statistic worthwhile. Please note that all of the normal considerations with regard to filtered statistics apply. They don’t update with their filtered threshold is met, but only when the threshold for the entire table is met, and only then if the filtered statistics is loaded for plan (re)compilation. They may add more time to any maintenance task. They may never be used. etc. etc. etc.

Stats Skew Analysis Script

Skewed Data: Finding The Columns

Posted on April 6, 2016 by swasheck

While in the process of looking for a job that led to the position that I currently hold, I interviewed for a job at a recognizably-named company that was struggling to keep up with their “statistics.” I have a confession. I had no idea what “statistics” were in the SQL Server world. I wasn’t offered that job, but in the interim I did a lot of reading and research on the SQL Server notion of “statistics.” Luckily, I was offered a position at another company a few months later. This company sent me to my very first PASS Summit (2013) in Charlotte, NC where I sat in on Kimberly Tripp’s “Skewed Data, Poor Cardinality Estimates, and Plans Gone Bad” session ¹.
Continue reading →

Finding Value in Incremental Statistics, Pt. 2

Posted on November 3, 2015 by swasheck

Previously, I did a little investigation into incremental statistics and what benefit they can provide. The goal was to find the intersection of update timings for incremental and non-incremental statistics, with the hope of being able to demonstrate a reclamation of statistics maintenance times – and perhaps higher sample rates. Summary: we can, and auto_stats is a beneficiary. However, is this a solution in search of a problem? If you’ve been bumping into statistics maintenance issues, you’re probably already familiar with more of the hands-on intervention available (my personal favorite is Ola Hallengren’s solution).
Continue reading →

Finding Value in Incremental Statistics Pt. 1

Posted on October 21, 2015 by swasheck

I originally created this post and observed some rather embarrassing sloppiness and miscalculations. I’ve simply decided to pull the post and rework it. This is the result of those labors.

Recently, I answered a question on dba.stackexchange related to an interesting occurrence with incremental statistics. This led to an good conversation about the efficacy (and utility) of incremental statistics. The primary focus was on Erin Stellato’s confirmation that partition-level stats are not used by the Query Optimizer, thereby identifying a rather sizable gap in the utility of incremental statistics. The only possible benefit I could champion for incremental statistics was that it might allow us to sample at consistently higher rates since the entire table would not need to be sampled – just the ones that presented a compelling enough reason for update (modification counter being a primary focus). Naturally, we’d incur the penalty of the higher sample rate at the initial sampling, but ongoing maintenance would be able to support the higher sample rates because we would only have to sample at the partition level. In this scenario, there are two measures to consider:
Continue reading →

Duplicate Statistics Finder

Posted on July 14, 2015 by swasheck

I’m often asked about whether or not we have “duplicate” statistics in our environment. The thought is that, perhaps, duplicate statistics would cause a delay in execution plan compilation or even a different plan. I’m not sure that it’d be a big deal to have them (please correct me if I’m wrong), but I still decided to work up a little script that will help identify them, at least.
Continue reading →

The SQL Server 2014 DDL Bulldozer Part 1.1: Implicit Insert

Posted on May 28, 2015 by swasheck

Previously I posted my experimentation with wait_at_low_priority. Specifically, I investigated what would happen when index rebuild was executed while another session was inserting data.

Next we want to see what happens when the index rebuild (with wait_at_low_priority abort blockers specified) is running and the insert is subsequently executed.

Here’s the setup (note that I’ve pre-populated tempdb.dbo.numbers with 15 million rows):

use master;
go

alter event session ddl_bulldozer on server
state = start;

go

use tempdb;
go

alter index ix_tmp_numbers_num on dbo.numbers
rebuild with (
data_compression = page,
online = on ( wait_at_low_priority ( max_duration = 0 minutes,
abort_after_wait = blockers ))
);
go

use master;
go

alter event session ddl_bulldozer on server
state = stop;

go

drop table #ddl_bulldozer
go
select object_name, cast(event_data as xml) event_data
into #ddl_bulldozer
from sys.fn_xe_file_target_read_file('ddl_bulldozer*',null,null,null) xf

select
x.d.value('@timestamp','datetime2(3)') event_timestamp,
db.object_name,
x.d.value('(action[@name="session_id"]/value)[1]','int') session_id,
x.d.value('(data[@name="statement"]/value)[1]','nvarchar(max)') sql_statement,
x.d.value('(data[@name="state"]/text)[1]','nvarchar(max)') sql_state,
x.d.value('(data[@name="duration"]/value)[1]','bigint') duration,
x.d.value('(data[@name="type"]/text)[1]','sysname') ddl_type,
x.d.value('(data[@name="abort_after_wait"]/text)[1]','sysname') abort_after_wait,
x.d.value('(data[@name="transaction_state"]/text)[1]','sysname') transaction_state,
x.d.value('(data[@name="killed_process_id"]/value)[1]','int') killed_process_id,
cast(substring(
x.d.value('(action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'),
1, 36) as uniqueidentifier) as activity_id,
cast(substring(
x.d.value('(action[@name="attach_activity_id"]/value)[1]', 'varchar(50) '),
38, 10) as int) as event_sequence,
cast(substring(
x.d.value('(action[@name="attach_activity_id_xfer"]/value)[1]', 'varchar(50)'),
1, 36) as uniqueidentifier) as activity_id_xfer
--,db.event_data
from #ddl_bulldozer db
cross apply event_data.nodes('//event') x(d)
where db.object_name not like 'lock%'
order by event_timestamp,event_sequence

On my test instance, the rebuild takes about 28 seconds to complete, so during that time, I switch to another window and run the insert session:

/*
kick off an implicit transaction
(with code shamelessly lifted from Kendra Little at:
http://www.brentozar.com/archive/2013/01/sql-server-table-partitioning-tutorial-videos-and-scripts/
)
*/
use tempdb;
go

;WITH Pass0 AS ( SELECT 1 AS C UNION ALL SELECT 1),
Pass1 AS ( SELECT 1 AS C FROM Pass0 AS A , Pass0 AS B),
Pass2 AS ( SELECT 1 AS C FROM Pass1 AS A , Pass1 AS B),
Pass3 AS ( SELECT 1 AS C FROM Pass2 AS A , Pass2 AS B),
Pass4 AS ( SELECT 1 AS C FROM Pass3 AS A , Pass3 AS B),
Pass5 AS ( SELECT 1 AS C FROM Pass4 AS A , Pass4 AS B),
tally AS ( SELECT row_number() OVER ( Order BY C ) AS N FROM Pass5 )
insert numbers
select *
FROM tally
WHERE N <= 1000000;
GO

Once again, the insert fails (after about 12 seconds) and sends the error message to the client
2.result
but what happened along the way?
xe1
I’ve added a new event (progress_report_online_index_operation) for a bit more visibility into what’s occurring. What we see is the index rebuild starting (line 2, in green) and the ddl_with_wait_at_low_priority event firing because we told the ddl operation to wait at low priority. About four seconds later was when I started the insert (in red). Along the way, we see that the index rebuild is making progress under different activity_id‘s. Because this is an online index rebuild so the page organization is occurring in tempdb and the second (insert) session is allowed to continue its activity, for now. Also note the quick termination of the statement for the sake of a recompile on the insert (recompile cause: “schema changed”).
xe2
At the end of it all, the insert transaction is killed and rolled back, just as in the last test. However, because the index rebuiled was online, the metadata operation to switch the rebuild index structure would be the first block the rebuild encountered. At this point, the insert still has its X lock (as before) on the object. This lock is terminated and the transaction is rolled back, paving the path for the metadata index structure switch.

The SQL Server 2014 DDL Bulldozer Part 1: Implicit Insert

Posted on May 27, 2015 by swasheck

Of the fascinating new features in SQL Server 2014, the one that provides me the most catharsis has got to be wait_at_low_priority. If your maintenance window SLA is threatened by “accidental” report cycles spilling into the window, perhaps you’ll want to explore this extension. While there are many blog posts out there to explain how to use it, I wanted to explore it and how it could apply to a variety of situations.
Continue reading →

Can you repro this bug? Trace Flag 2363 Causes Fatal Exceptions on Windowed Aggregates

Posted on May 12, 2015 by swasheck

While testing SQL Server 2014 Enterprise edition, I wanted to see if I could use trace flag 2363 to gather meaningful information for statistics analysis. Part of this analysis included getting some information on the statistics in question.

With Trace Flag 2363 enabled:

DBCC TRACEON(2363,-1); GO select schema_name = sh.name, table_name = t.name, stats_name = s.name, leading_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1), modification_percent = (100.*sp.modification_counter)/sp.unfiltered_rows ,sp.modification_counter ,sp.unfiltered_rows ,sp.rows_sampled ,avg_rows_in_db = avg(1.*sp.unfiltered_rows) over () from sys.stats s join sys.tables t on s.object_id = t.object_id join sys.schemas sh on t.schema_id = sh.schema_id cross apply sys.dm_db_stats_properties(s.object_id,s.stats_id) sp where sh.name != 'sys';

This results in a severe error to the client

Msg 0, Level 11, State 0, Line 77
A severe error occurred on the current command. The results, if any, should be discarded.
Msg 0, Level 20, State 0, Line 77
A severe error occurred on the current command. The results, if any, should be discarded.

Parsing the minidump with WinDbg, I arrived at this as the faulting module

sqllang!DRgCId::Print+0x68

I’ve attempted repro with user tables and different aggregates and still get the same issues. If you can reproduce, feel free to vote at the Connect item below.

https://connect.microsoft.com/SQLServer/feedback/details/1327021/trace-flag-2363-causes-fatal-exceptions-on-windowed-aggregates

SQL Saturday 358 Recap (#sqlsat358)

Posted on February 9, 2015 by swasheck

Big thanks to the Albuquerque SQL Saturday team – you always treat the speakers, sponsors, and guests so well. Huge thanks to Meredith Ryan who is an ever-present force of organization and helpfulness.

Continue reading →

Exchange Spill

waiting for parallel threads to complete …

Author Archives: swasheck