One day, one of my clients complained about the size of his SQL Server log. He mentioned that it had reached 1GB for just 1 million inserted rows. His issue was that sometimes he had to insert 1 billion of them ! Hopefully there is SQL Server minimal logging !

SQL Server can be really « verbose » regarding its database log however there are several ways to reduce the volume of the SQL Server database log. Will explore several of them in this article. But before diving into that, it’s important to explain some aspects of the SQL Server database log:

  1. What is the purpose of the log in a database
  2. The recovery model and its impact on the log.
  3. The SQL Server minimal logging.

What is the purpose of the database log in SQL Server ?

In the world of database management systems, ensuring data integrity, reliability, and recoverability is of paramount importance. SQL Server, like many relational database management systems (RDBMS), has a dedicated mechanism to achieve these goals – the database transaction log. The transaction log, often simply referred to as the « log, » plays a crucial role in the operation and maintenance of a SQL Server database. Let’s delve deeper into its purposes:

1. Transaction Durability: At its core, the transaction log provides a record of all changes made to the data and the sequence in which these changes occurred. When a transaction is committed, its changes are first written to the log and then to the data files. This ensures that even if a sudden failure occurs before the changes are written to the data files, the database can be recovered using the log.

2. Point-in-time Recovery: The transaction log holds the history of all changes, which means that it can be used to restore a database to a specific point in time. This is particularly useful in cases where accidental data modifications or deletions occur.

3. Database Recovery: In the event of system failures, such as hardware crashes, power outages, or software issues, the transaction log is the tool SQL Server uses to bring the database back to a consistent state. Upon restarting, SQL Server will perform a recovery process, replaying transactions in the log that might not have been written to the data files.

4. Log-based Replication: SQL Server supports log-based replication. This means changes are propagated to subscriber databases using the transaction log, ensuring that all replicas stay in sync with the primary.

5. Support for High Availability Solutions: Features like Always On Availability Groups and database mirroring rely on the transaction log to replicate data changes to secondary replicas, providing data redundancy and increasing data availability.

6. Rollback Capabilities: If a transaction is interrupted (e.g., a system failure or a manual transaction abort), the changes can be undone using the transaction log. This ensures that databases maintain data consistency, adhering to the ACID (Atomicity, Consistency, Isolation, Durability) properties of transactions.

In summary, the transaction log in SQL Server is not just a passive record of events. It’s an active component, critical for the day-to-day operations, recoverability, and data integrity of a database. Proper management and understanding of the log are essential for anyone responsible for the health and performance of SQL Server databases.

The impact of the recovery model on the database log

SQL Server offers three primary recovery models: Simple, Full, and Bulk-Logged. The choice of recovery model directly influences the behavior and growth of the transaction log.

1. Simple Recovery Model:

Log Behavior: In the Simple recovery model, the transaction log records are truncated (i.e., log space is reclaimed) after each checkpoint, provided there are no other dependencies. This means the log does not grow indefinitely and doesn’t require frequent manual intervention to manage its size under normal operations.
Recovery: With the Simple recovery model, point-in-time recovery is not possible. You can only restore to the last full or differential backup.
Use Cases: This model is suitable for databases where data is not changing frequently (datawarehouses, datamarts), for test and development environments, or when point-in-time recovery is not a concern.

2. Full Recovery Model:

Log Behavior: Transaction log records are preserved until a log backup is taken. The log can grow considerably if log backups are not scheduled regularly. It’s crucial to implement a frequent log backup routine to prevent the log from filling up the allocated disk space.
Recovery: The Full recovery model supports point-in-time recovery, ensuring minimal data loss in the face of a disaster. You can restore the database to any specific point in time, given you have the necessary sequence of log backups.
Use Cases: This model is ideal for production databases, databases containing critical data, or any environment where the possibility of data loss is to be minimized.

3. Bulk-Logged Recovery Model:

Log Behavior: Much like the Full recovery model, the log records are not truncated until a log backup is taken. However, bulk operations such as `BULK INSERT`, `SELECT INTO`, and `CREATE INDEX` are logged minimally to improve performance.
Recovery: Point-in-time recovery is possible, but there’s a caveat. If the log contains bulk-logged transactions, you can’t perform a point-in-time restore to a time within those transactions.
Use Cases: This model can be temporarily used during periods of bulk operations to improve performance. After the operations are done, it’s common practice to switch back to the Full recovery model and take a log backup immediately.

In Brief:

The recovery model chosen plays a pivotal role in transaction log management and the ability to restore data. It’s essential to understand the implications of each model to ensure that you can meet recovery objectives and manage storage efficiently. Regular log backups in Full and Bulk-Logged recovery models are vital to control log growth and enable point-in-time recovery capabilities.

SQL Server minimal logging

When data is modified in SQL Server, the changes are typically fully logged to ensure data integrity and allow for point-in-time recovery. However, there are scenarios where logging every single detail might be overkill and could severely impact performance, especially during large-scale operations. This is where minimal logging comes into play.

What is SQL Server Minimal Logging?

SQL Server Minimal logging involves logging only the information necessary to roll back the transaction without supporting point-in-time recovery. Essentially, instead of logging every row change in detail, SQL Server logs just enough information to ensure that the operation can be undone, if necessary.

When is SQL Server Minimal Logging Useful ?

The primary benefit of SQL Server minimal logging is performance optimization and log size reduction. Bulk operations, like inserting a vast number of rows into an empty table, can be executed faster under minimal logging because less I/O is required to write the log records. This makes operations like ETL\ELT processes more efficient.

Operations that Support Minimal Logging:

Certain operations in SQL Server can take advantage of SQL Server minimal logging under specific conditions:
– `BULK INSERT`
– `SELECT INTO`
– `INSERT…SELECT`
– Indexed `CREATE` and `ALTER TABLE` operations.

However, the degree to which these operations are minimally logged depends on several factors, including the recovery model of the database and the structure of the target table.

Conditions for Minimal Logging:

For minimal logging to be in effect, specific prerequisites must be met:
1. The database must be in the `Bulk-Logged` or `Simple` recovery model. While minimal logging is most commonly associated with the `Bulk-Logged` recovery model, it can also occur in the `Simple` recovery model.
2. The target table must either be empty or be a heap without any non-clustered indexes. For B-tree-structured tables (like those with a clustered index), only the non-leaf pages are minimally logged.
3. The operations should be bulk operations. Regular small DML operations like individual INSERT, UPDATE, or DELETE statements are always fully logged.

Considerations:

While minimal logging can significantly improve performance, it’s essential to understand its implications:
1. Recovery Limitations: Since not all changes are logged in detail, point-in-time recovery is not possible for the duration when the bulk operations were performed.
2. Log Backup: After a bulk operation in a database using the `Bulk-Logged` recovery model, it’s recommended to take a log backup to ensure the log doesn’t grow uncontrollably and to return to a state where point-in-time recovery is possible.

Factors that influence SQL Server minimal logging

The are several factors that influence the SQL Server minimal logging when using massive insertions into a table :

  • Is the table a heap ?
  • Does the table have secondary indexes ?
  • Do you use the table hint « TABLOCK » in the INSERT Statement ?
  • Is your SQL Server Version 2016+ ?
  • Do you have the trace flag T610 activated ?

This will lead to a matrix/table.
For the moment i did not test on version pre-2016 so the T610 trace flag is not usefull

CHECKPOINT
GO
-- Using With TabLock on a Heap
DROP TABLE IF EXISTS [dbo].[TEST_71_1M_12m_COPY] ;

SELECT * INTO [dbo].[TEST_71_1M_12m_COPY] 
FROM [dbo].[TEST_71_1M_12m]
WHERE 1=0;

BEGIN TRAN;

INSERT INTO 
[dbo].[TEST_71_1M_12m_COPY] WITH(TABLOCK)
SELECT *
FROM [dbo].[TEST_71_1M_12m];

COMMIT TRAN;

How minimal is SQL Server minimal logging ?

To better understand how SQL Server minimal logging is minimal and in which case it is apply I develop some store procedures to test different cases (the source code here). I add also a very interesting comparison with the same data inserted into a table with a clustered columnstore index.

We can clearly see that TABLOCK HINT reduce the loading time. This is because with this hint activated SQL Server will load data in parallel (if compatibility level >= 130 nor secondary indexes)). We can also see that when TABLOCK hint in « On » and the table have a classic clustered index but (no secondary indexes) then the log volume is reduced by a factor 50.

What we also learn with theses experiences is that with a clustered column store index on the data the log volume is reduce by 100x even without TABLOCK HINT (200x with TABLOCK HINT) compare to classic clustered indexes or heap table that have not minimal logging activated.

Trace Flags ?

Trace flags in SQL Server are configuration settings that enable specific characteristics of the server or alter certain behaviors. They are primarily used for:

1. Debugging and Troubleshooting: Some trace flags provide detailed information about how SQL Server is processing data, which can be invaluable for resolving performance or functional issues.

2. Changing Behavior: Some flags alter the way SQL Server operates, either globally or at the session level, allowing administrators to tailor the system’s behavior to their specific needs or to work around certain issues.

3. Feature Management: Certain features or behaviors that aren’t set as the default can be enabled using trace flags.

Trace flags can be activated at startup (by adding them to the SQL Server startup parameters) or during runtime using the `DBCC TRACEON` and `DBCC TRACEOFF` commands. It’s important to be cautious when using trace flags, especially in production environments, because they can drastically affect performance or behavior. Always consult official documentation or experts when considering their use.

Before SQL Server 2016 the Trace flag T610 what often use in datawarehouse to reduce log volume needs. but the current SQL Server documentation about trace flag ( T610 extracted below) explain that this trace flag is not required for version above 2016… It seams the documentation is not correct (or i missed something)

I decide to be certain about the trace flag 610 and ad a DBCC TRACEON(610) as a preload statement. The results show that the traceflag 610 as no effect with my SQL Server 2022 instance

What about trace flag T715 that should able a « transparent » TABLOCK without using hint ?

Same story than T610, I decide to see if T715 has the behavior describe in the documentation…
Unfortunately T715 has no effect in SQL 2022 (at least)

Conclusion

What we learn is that minimal logging can greatly reduce the volume of your log file when massive DML is used. This feature is not turned on by default : you will need to use a TABLOCK HINT. Don’t count on minimal logging if you have secondary indexes. You will have to found loading strategy (using temporary tables and maybe partitionned table switches) to benefit from minimal logging if you have table with data and indexes you cannot drop before loading data..

For analytics plateforme, columnstore is another way to reduce log needs even without the TABLOCK hint but in this case the load is made in serial a then slower.
Columnstore compression will this reduce data storage (by 7 to 40x) but also the log storage (by 50 to 200x) compare to other storage. It will be to a slower loads (5x) but faster reads (10x)… not always of course, but most of the time. Don’t use it if you are in an heavy OLTP environment but if you’re in an OLAP one, you should consider it.