Gather and Check statistics in Sql Server

 

How Sql gathers these Statistics

  • Optimize does not generaae the right plan without these status
  • Other data relevant for SQL Performance include 
    • the structure of tables and views
    • Definition of Indexes 

 

 

To Generate these statistics in Sql Server

In SQL Server, gathering and maintaining statistics is crucial for the Query Optimizer to create efficient execution plans, leading to optimal query performance. Here are the different ways to gather and manage statistics:

 

 

Automatic Statistics Management (Default and Recommended)

  • There are built-in technologies in SQL Server that can generate and update statistics automatically. In most cases, they will be the most suitable choices for your task.
  • AUTO_CREATE_STATISTICS
    • When enabled (which is by default), the Query Optimiser automatically creates single-column statistics on columns used in query predicates (e.g., WHERE clauses, JOIN conditions, ORDER BY clauses) if no index or manually created statistics provide enough information.
    • These automatically created statistics usually have names starting with _WA_Sys
    • Still we can Enable / disable this defaut behaviour using TSQL

 

  • AUTO_UPDATE_STATISTICS:
    • SQL Server automatically updates statistics when table data changes considerably when enabled (which is by default)
    •  The threshold for "significant change" depends on the table size (e.g., for tables over 500 rows, it's generally 500 + 20% of the table size).
    • This ensures the Query Optimizer has reasonably up-to-date information for planning queries.
    • Still we can Enable / disable this defaut behaviour using TSQL

 

  • AUTO_UPDATE_STATISTICS ASYNCHRONOUS:
    • This option alters the behaviour of AUTO_UPDATE_STATISTICS when statistics become outdated while a query requires them; it is disabled by default.
    • synchronous
      • A query with stale statistics will wait for statistics to be updated before compiling and running if AUTO_UPDATE_STATISTICS_ASYNCHRONOUS is OFF. The query utilises the latest stats, although it may delay. 
    • Asynchronous
      • If AUTO_UPDATE_STATISTICS_ASYNCHRONOUS is ON, a query with stale statistics will compile and execute using the old statistics, while a background thread will update them. Avoiding the delay can improve immediate query response time, but the present query may run with a poor plan. Updated statistics will help future questions.
      • Still we can Enable / disable this defaut behaviour using TSQL
    • imp: Most OLTP workloads that require consistent plan quality should have this parameter off (synchronous). Allow it for systems that allow occasional inferior plans to avoid statistics update query latency.

 

 

  • Create/Rebuild Index
    • Implicit Statistics: SQL Server automatically builds or updates key column statistics when you establish or rebuild a clustered or non-clustered index.
      • Since indexes are generally formed on query columns, this is how statistics are preserved.
      • Since SQL Server 2014, partitioned index statistics are generated using the default sampling algorithm, not a complete scan.

 

 


Manual Statistics Management
There are Scenerios when manual intervention helps:

  • UPDATE STATISTICS Statement:
    • This is the main T-SQL command to manually update table, indexed view, or statistics object statistics.
    • When to use (Manual Statiscs):
      • After bulk inserts or massive UPDATE/DELETE operations that alter more data than AUTO_UPDATE_STATISTICS.
      •  if AUTO_UPDATE_STATISTICS_ASYNCHRONOUS is ON and a crucial query requires quick statistics.
      • For columns with skewed data or particular value ranges, default sampling or automated updates don't correctly reflect the data distribution.
      • As part of scheduled maintenance, especially for big tables when waiting for the automated threshold may take too long.

 

 

  • sp_updatestats Stored Procedure:
    • This stored procedure changes all database user-defined and internal table statistics. This wrapper calls UPDATE STATISTICS for each table.
    • When to Use - As general statistics maintenance, especially for small to medium datasets. Since it updates statistics even for tables with modest modifications (though it checks modification_counter to minimise superfluous updates), it may be too resource-intensive to run regularly for big databases.

 

 

  • CREATE STATISTICS Statement:
    • Allows explicit statistics object creation on table or indexed view columns. Useful for:
      • Multi-column statistics: When a query contains many columns in predicates and SQL Server's automated single-column statistics don't capture the association.
      • Filtered statistics: Create statistics on a subset of rows, which is useful for queries targeting frequently requested data ranges.
      • When to use: For advanced tweaking cases when the Query Optimiser routinely delivers poor plans owing to missing or insufficient statistics on certain column combinations or data subsets.

 

 

  • Database Maintenance Plans/Third-Party Tools
    • SQL Server Management Studio (SSMS) may schedule regular operations like "Update Statistics." It offers a graphical interface for automatic statistics updates.
    • Third-Party Solutions: Database administration tools and scripts like Ola Hallengren's Maintenance Solution offer more granular control and intelligence for statistics maintenance, assessing modification percentages before updating to save resources.
       

Checking Statistics Information

 

Statistics and properties may be examined using:

  • A statistics object's histogram and density vector are shown by DBCC SHOW_STATISTICS.
  • Statistics object metadata (e.g., latest updated date, number of rows, modification counter) is provided by system catalogue views and dynamic management functions.

 

 

Where Statistics store in Sql Server

Unlike data or log files (MDF/LDF), statistics in SQL Server are not kept as simple files that may be viewed in the file system. Rather, they are kept in system tables as binary large objects (BLOBs) inside the database.

 


Related Question