Written by: Biljana Lazic (bwin – Senior DBA) and Rick Kutschera (bwin – Engineering Manager). Reviewed by: Mike Weiner (SQLCAT)

bwin (part of GVC Holdings PLC) is one of Europe’s leading online betting brands, and is synonymous with sports. Having offices situated in various locations across Europe, India, and the US, bwin is a leader in several markets including Germany, Belgium, France, Italy and Spain.

To be able to achieve our goals in these increasingly competitive markets, bwin’s infrastructure is constantly being pushed to stay on top of today’s – and sometimes even tomorrow’s – technology demands. With around 19 million bets and over 250 000 active users per day, our performance and scale requirements are extraordinary. In this blog, we will discuss how we have adopted In-Memory OLTP with SQL Server 2016 to meet these demands.

Our Caching Systems:

For years we’ve depended on Microsoft AppFabric and other distributed cache systems such as Cassandra or Memcached, as one of the central pieces in our architecture, in order to meet the demanding requirements on our systems. All major components, including sports betting, poker, and casino gaming rely on this cache, which makes it a critical component for our current and future business needs. In fact, a failure of this caching system would directly translate to a total blackout of our business, making it one of the most mission critical systems in the overall architecture.

With this configuration, we faced scalability issues, and worse, we saw that with higher transaction volumes, the stability of the whole distributed cache system was not able to keep up with the workload. Even in scaling out the amount of caching nodes we still faced stability issues, leading to high availability degradation.

Additionally, there were setup and maintenance pains, which lead to a high workload overhead for various departments to keep the system operational. Even with all this reoccurring work, the results we achieved were never satisfactory.

Below is a simplified architecture of our distributed caching system:

For these reasons, to become more stable and keep up with the business requirements, we were forced to consider better alternatives and overall solutions for our caching layer.

That was the moment when we realized that we already had the solution – codenamed “Hekaton”, the In-Memory OLTP engine, built within SQL Server. In fact, this was the technology we were already using for a similar solution for a while, our ASP.NET SessionState.

Our History with ASP.Net SessionState:

ASP.NET SessionState itself is a caching system that we have been using for a long time now and thanks to the performance we achieved, it became a building block for our new global caching database. But before we get further into the global caching database, let’s take a look into our history with ASP.NET SessionState.

bwin was the very first customer to use the In-Memory OLTP engine for an ASP.Net SessionState database in production, with SQL Server 2014 (long before it was called SQL Server 2014). For reference, before SQL Server 2014, our ASP.NET SessionState database was able to handle around 12,000 batch requests/sec before suffering from latch contention. We were forced to implement 18 SessionState databases and partition our workload to handle the required user load. With SQL Server 2014 In-Memory OLTP we could consolidate back to one SQL Server box and database to handle ~300,000 batch requests/sec, with the bottleneck moving to the splitting-up of data as there was no support of LOB datatypes with memory-optimized tables.

Over the years, we gathered a good deal of experience working with In-Memory OLTP for our ASP.NET SessionState solution, and the performance of the solution received positive feedback both from within and outside of our company.

With In-Memory OLTP in SQL Server 2016 extending its functionality, for example memory-optimized table support for LOB datatypes, we were able to move more code into native compiled stored procedures, and even reach a new record for our In-Memory OLTP based ASP.NET SessionState database during our research LAB engagement with Microsoft Enterprise Engineering Center (EEC) in October 2015.

With all the scalability improvements in SQL Server 2016, and migration of the all the Transact-SQL to natively compiled stored procedures and a memory-optimized table, we were able to achieve and sustain over 1.2 million batch requests/second – an improvement of 4x over our previous high watermark!

Below is the summary of performance improvements we’ve achieved by using In-Memory OLTP for our ASP.NET SessionState. We have also included the batch requests/sec from performance monitor and the measurements of waits within SQL Server

Version	SessionState Performance	Technology	Bottleneck
SQL Server 2012	12 000 batch requests/sec	SQL Server Interpreted T-SQL	Latch contentions
SQL Server 2014	300 000 batch requests/sec	Memory-Optimized Table, Interpreted T-SQL, Handling of Split LOBs	CPU
SQL Server 2016	1 200 000 batch requests/sec	Memory-Optimized Table with LOB support, Natively Compiled stored procedures	CPU

**NOTE: In Testing we did hit spinlock issues at around 800,000 batch requests/sec, this is already resolved within SQL Server 2016 CU2 (Blog: https://blogs.msdn.microsoft.com/sqlcat/2016/09/29/sqlsweet16-episode-8-how-sql-server-2016-cumulative-update-2-cu2-can-improve-performance-of-highly-concurrent-workloads/)

With this level of performance and stability achieved, together with all the experience we gathered during years of working with the ASP.NET SessionState solution, we felt confident we could utilize this as our building block for a new global caching system based on the In-Memory OLTP engine.

Our New Global Caching System

The global caching system, based on our ASP.NET Session State implementation and In-Memory OLTP is now the replacement for all our distributed caching systems.

Below is a simplified architecture diagram of our new global caching system with SQL Server 2016 In-Memory OLTP:

Having changed the architecture, we can now not just keep up with performance we had with our distributed cache system, but far-exceed it, using only one single database node, compared to the 19 mid-tier cache nodes previously used.

Version	Cache System Performance	Hardware nodes/distribution
Mid-Tier Cache solution without SQL Server	150 000 batch requests/sec	19
Solution using SQL Server 2016	1 200 000 batch requests/sec	1

In addition to reducing the number of servers needed to obtain the performance we needed, we also achieved several performance gains, as displayed in the diagram below. The graphic, which comes from our application monitoring solution, shows three things:

Usage: First, the green circular rings represent all the different products (and corresponding server counts which are represented by the numbers in the circles) dependent on the global cache. This includes numerous products from sports betting, casino gaming, bingo, all the way to our portal and sales API, just to name a few. With all these components accessing the cache, it has become even more central in our architecture than the ASP.NET SessionState.

Performance Throughput: Next, you can see our average load, that’s around 1.6 million requests per minute. With our user load scaling up, we expect to have twice the number of batch requests/sec and we expect to be able to handle even up to 20 times this load.

Performance Latency: Finally, you can see the consistent latency measured from the client around round-trip time at 1ms. This number is even more important if we directly compare it to the previous distributed cache system, where the latency varied all the time, having response times from 2ms up to 200ms.

Also of note, as the global cache is such a central piece of our business we also have it as part of a high availability solution. In this case, if for some reason the server hosting the SQL Server database does fail we can easily move the workload to another SQL Server database.

Implementation of the Global Caching Database

While the performance gains with In-Memory OLTP have been amazing, the implementation at its core was quite simple. From the database perspective, the memory-optimized table, simply put, is just a key/value store with its primary makeup containing a key (Primary Key), a value (BLOB) and an expiration date. The application has three possible ways of interacting with this table, in the form of three natively compiled stored procedures:

One stored procedure to insert a value with a key into the table
One stored procedure to retrieve the value, by providing the key
One stored production to delete the value from the table.

In the background, there is a scheduled T-SQL job which deletes all expired entries from the table, on a regular basis.

From the development perspective, the impact was quite minimal. Previously, all the code necessary to access the distributed cache solutions was contained in an “abstraction layer” DLL. Hence the changes needed to use SQL Server as the caching solution were quite localized to that specific DLL, which meant no impact to the actual applications using the caching tier.

As our senior software engineer on the project noted “All in all the migration of the code in [our] framework was not so difficult, we spent much more time by testing and tuning it than by the implementation itself.” – Marius Zuzcak

In the appendix below we provide code from the database, of the memory-optimized table using the LOB datatype, as well as code from the data access layer with calls to the In-Memory OLTP Global Cache.

Appendix: Code Examples

Below is the Transact-SQL code for the memory-optimized table, now with LOB datatype support:

CREATE TABLE [dbo].[CacheItems]
(    
[Key] [nvarchar](256) COLLATE Latin1_General_100_BIN2 NOT NULL,
[Value] [varbinary](max) NOT NULL,
[Expiration] [datetime2](2) NOT NULL,
[IsSlidingExpiration] [bit] NOT NULL,
[SlidingIntervalInSeconds] [int] NULL,
CONSTRAINT [pk_CacheItems] PRIMARY KEY NONCLUSTERED HASH
( [Key]) WITH ( BUCKET_COUNT = 10000000)
)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_ONLY )

Here we are providing pseudo-code from the data-access layer code (Hekaton.Dal).

using System;
using System.Data.SqlClient;
using System.Diagnostics.Contracts;

namespace Hekaton.Dal
{
   internal interface IHekatonDal
   {
       object GetCacheItem(string key);
       void SetCacheItem(string key, object value, CacheItemExpiration cacheItemExpiration);
       object RemoveCacheItem(string key);
   }
   internal sealed class HekatonDal : IHekatonDal
   {
       private readonly IHekatonConfiguration configuration;
       public HekatonDal(IHekatonConfiguration configuration)
       {
           Contract.Requires(configuration != null);
           this.configuration = configuration;
       }
       public object GetCacheItem(string key)
       {
           byte[] buffer;
           long length;
           ExecuteReadOperation(key, out buffer, out length, SqlCommands.GetCacheItemCommand);
           return Serialization.DeserializeBuffer(buffer, length);
       }
       public void SetCacheItem(string key, object value, CacheItemExpiration cacheItemExpiration)
       {
           ExecuteSetCacheItem(key, value, cacheItemExpiration);
       }
       public object RemoveCacheItem(string key)
       {
           byte[] buffer;
           long length;
           ExecuteReadOperation(key, out buffer, out length, SqlCommands.RemoveCacheItemCommand);
           return Serialization.DeserializeBuffer(buffer, length);
       }
       private void ExecuteReadOperation(string key, out byte[] valueBuffer, out long valueLength,
           Func<SqlConnection, SqlCommand> readCommand)
       {
           using (var connection = new SqlConnection(configuration.HekatonConnectionString))
           using (var command = readCommand(connection))
           {
               command.Parameters[0].Value = key;
               connection.Open();
               using (command.ExecuteReader())
               {
                   // Item is returned from the output parameter.
                  var value = command.Parameters[1].Value;
                   if (Convert.IsDBNull(value))
                   {
                       valueBuffer = null;
                       valueLength = 0;
                   }
                   else
                   {
                       valueBuffer = (byte[])value;
                       valueLength = valueBuffer.LongLength;
                   }
               }
           }
       }
       private void ExecuteSetCacheItem(string key, object value, CacheItemExpiration cacheItemExpiration)
       {
           using (var connection = new SqlConnection(configuration.HekatonConnectionString))
           using (var command = SqlCommands.SetCacheItemCommand(connection))
           {
                var serializedValue = Serialization.SerializeObject(value);
                command.Parameters[0].Value = key;
                command.Parameters[1].Value = serializedValue;
                command.Parameters[2].Value = cacheItemExpiration.AbsoluteExpiration;
                command.Parameters[3].Value = cacheItemExpiration.IsSlidingExpiration;
                command.Parameters[4].Value = cacheItemExpiration.SlidingIntervalInSeconds;
                connection.Open();
                command.ExecuteNonQuery();
           }
       }
   }
}

Below are the SQL Server calls to get, set, and remove items from the memory-optimized table, these calls can be utilized from the data access layer

//GetCacheItem
public override CacheItem GetCacheItem(string key, string regionName = null)
       {
           var compositeKey = BuildCompositeCacheKey(key);
           try
           {
               var result = hekatonDal.GetCacheItem(compositeKey);
               return result != null ? new CacheItem(key, result, regionName) : null;
           }
           catch (SqlException ex)
           {
               Log(CreateLogMessage(key, compositeKey, ex, "Failed to get"), ex);
               return null;
           }
           catch (SerializationException ex)
           {
               log.Error(CreateLogMessage(key, compositeKey, ex, "Failed to get"));
               return null;
           }
       }
//SetCacheItem
   public override void SetCacheItem(string key, object value, CacheItemPolicy policy, string regionName = null)
       {          
           var compositeKey = BuildCompositeCacheKey(key);
           try
           {
               hekatonDal.SetCacheItem(compositeKey, value, new CacheItemExpiration(policy, configuration));
           }
           catch (SqlException ex)
           {
               Log(CreateLogMessage(key, compositeKey, ex, "Failed to set"), ex);
           }
       }
//RemoveCacheItem
public override object RemoveCacheItem(string key, string regionName = null)
       {
           var compositeKey = BuildCompositeCacheKey(key);
           try
           {
               var result = hekatonDal.RemoveCacheItem(compositeKey);
               return result;
           }
           catch (SqlException ex)
           {
               Log(CreateLogMessage(key, compositeKey, ex, "Failed to remove"), ex);
               return null;
           }
           catch (SerializationException ex)
           {
               log.Error(CreateLogMessage(key, compositeKey, ex, "Failed to deserialize (while removing)"));
               return null;
           }
       }

Reviewed by: Kun Cheng, John Hoang, Sanjay Mishra, Borko Novakovic, Denzil Ribeiro, Murshed Zaman

Have you ever got that sinking feeling after hitting the Execute button in SSMS, thinking “I should not have done that”? DML statements with the missing WHERE clause, DROP statements accidentally targeting slightly mistyped (but existing) tables or databases, RESTORE statements overwriting databases with new data that haven’t been backed up, are all examples of actions prompting an “Oops…” (or worse) shortly thereafter. “Oops recovery” is the term that became popular to describe the process of fixing the consequences.

For most of these scenarios, the usual, and often the only, recovery mechanism is to restore the database from backup to a point in time just before the “oops”, known as point-in-time recovery (PITR). Even though PITR remains the most general and the most effective recovery mechanism, it does have some drawbacks and limitations: the recovery process requires a full database restore, taking the time proportional to the size of the database; a sequence of restores may be needed if multiple “oops” transactions have occurred; in the general case, there will be difficulties reconciling recovered data with data modified after the “oops” point in time, etc. Nevertheless, PITR remains the most widely applicable recovery method for SQL Server databases, both on-premises and in the cloud.

In this article, we would like to discuss another option that became recently available, that can greatly simplify some recovery scenarios. We will discuss recovering from an “oops” using temporal tables, available in SQL Server 2016 and in Azure SQL Database. The linked documentation provides a detailed description of this new feature. As a quick introduction, a temporal table keeps a record of all data changes by saving versions of rows in a separate history table, with a validity period attached to each version. T-SQL language has been extended to simplify querying of current and historical data. In terms of performance overhead, there is none for INSERT and SELECT statements on the current data. The overhead of other statements is similar to that incurred by maintaining an additional index, and is in any case less than the overhead of other solutions for keeping history, such as triggers or CDC.

From the outset, we need to say that this method is applicable to a rather narrow subset of scenarios, considering all the possibilities for an “oops”. It also requires advance preparation, i.e. modifying table schema to add temporal period columns, and explicitly enabling system versioning. But for those specific scenarios, it allows a much simpler recovery process than PITR, and is therefore worth considering in detail.

To explain the recovery process with a temporal table, let’s consider an example (based on a true story).

In this example, an application uses a SQL Server table as a queue. Messages are generated by multiple application processes, and for each message, a row is inserted into this table. A separate process retrieves these messages from the queue, i.e. executes a single row SELECT statement against the table, processes message payload, and then deletes the processed message row from the table. (As an aside, this is not the most optimal way to implement a queue using a table. It would be better to dequeue messages using a DELETE statement with an OUTPUT clause.)

A new code release introduces a bug where rows are selected, and then deleted immediately, without calling the code to process the message. This is not noticed until 7000 messages are deleted without having been processed, while the rest of the workload continues to run and modify data in the database.

To recover from this using the traditional point-in-time recovery, it would have been necessary to perform 7000 precise point-in-time restores, which is not feasible for most applications. Another possible option is to reconstruct the data from the transaction log, however there are no documented or supported ways to do that, and it may be extremely complex or not even possible in the general case.

Now let’s consider what would happen if the queue table in this example were a temporal table.

CREATE TABLE dbo.WorkQueue
(
WorkQueueId int NOT NULL,
MessagePayload nvarchar(max) NOT NULL,
SysStartDateTime datetime2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL,
SysEndDateTime datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,
PERIOD FOR SYSTEM_TIME (SysStartDateTime, SysEndDateTime),
CONSTRAINT PK_WorkQueue PRIMARY KEY (WorkQueueId)
)
;

Compared to a regular table, there are two new columns added here, SysStartDateTime and SysEndDateTime. The values for these columns are automatically generated by SQL Server when rows are inserted, updated, and deleted. Note that in this example, the columns are hidden. While making these columns hidden is optional, it may be useful to avoid impacting the application. If the application code does not follow the best practice of always explicitly listing column names, e.g. if it uses SELECT *, or INSERT statements without target column list, then the addition of new columns can break it. Making the period columns hidden avoids this problem. Note that even if the columns are hidden, it is still possible to see column values if they are explicitly included in the SELECT column list.

If you are familiar with temporal tables, you may have noticed that something is missing from the CREATE TABLE statement above. Specifically, this statement only creates the current table, and there is no mention of the history table. The history table is where older row versions are saved when UPDATE, DELETE, and MERGE statements modify rows in the current table.

In this example, we intentionally do not create the history table right away. If we did, it would start immediately accumulating row versions. For a queue table, it means that every message placed on the queue would effectively remain in the database, potentially using a significant amount of storage. Whether that makes sense depends on the specific application context. If the choice is to have the system versioning enabled at all times, then using a Clustered Columnstore Index for the system table would reduce storage overhead. An example is provided in Creating a temporal table with a user-defined history table.

In this example, we assume that the oops protection provided by temporal tables is only needed during some critical phases of application lifecycle, i.e. during a new code release.

Just prior to the release, we go ahead and enable protection by turning on system versioning:

ALTER TABLE dbo.WorkQueue SET
(SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WorkQueueHistory))
;

Note that the dbo.WorkQueueHistory history table referenced in the above statement does not have to be created explicitly. It will be automatically created by SQL Server as a part of the ALTER TABLE statement above, based on the schema of the dbo.WorkQueue table.

From this point on, whenever a row in the dbo.WorkQueue table is updated or deleted, the version of the row as it existed immediately prior to the update or delete will be inserted in the dbo.WorkQueueHistory table.

Next, the application code (with the disastrous bug) is deployed. Before allowing the application to use the database, we note the exact time of the release (as an aside, this is a best practice for any critical change):

SELECT SYSUTCDATETIME() AS ReleaseDateTime;
-- 2016-11-03 17:07:21.5027748

The application is brought online, messages are added to the queue, and, due to the bug, are deleted right away:

INSERT INTO dbo.WorkQueue (WorkQueueId, MessagePayload)
VALUES (1, 'Message1');

DELETE dbo.WorkQueue
WHERE WorkQueueId = 1;

INSERT INTO dbo.WorkQueue (WorkQueueId, MessagePayload)
VALUES (2, 'Message2');

DELETE dbo.WorkQueue
WHERE WorkQueueId = 2;

INSERT INTO dbo.WorkQueue (WorkQueueId, MessagePayload)
VALUES (3, 'Message3');

DELETE dbo.WorkQueue
WHERE WorkQueueId = 3;

A few minutes later, or a few hours later if we are unlucky, the oops moment arrives, and the problem is noticed. The application is taken offline, and developers start working on a fix. At this point, we note the time when the “bad” workload stops:

SELECT SYSUTCDATETIME() AS WorkloadStopDateTime;
-- 2016-11-03 17:07:40.0709518

In the meantime, the queue table is empty:

SELECT WorkQueueId,
       MessagePayload
FROM dbo.WorkQueue
;

(0 row(s) affected)

The immediate question is whether anything can be done to bring back message data that wasn’t processed while the code with the bug was deployed. Luckily, because we enabled system versioning prior to the release, we can indeed do it, using a single TSQL statement:

WITH LatestVersion AS
(
SELECT WorkQueueId,
       MessagePayload,
       ROW_NUMBER() OVER (PARTITION BY WorkQueueId ORDER BY SysEndDateTime DESC) AS VersionNumber
FROM dbo.WorkQueueHistory
WHERE -- restrict to rows created after the release ...
      SysStartDateTime >= '2016-11-03 17:07:21.5027748'
      AND
      -- ... and rows deleted before the fix went in
      SysEndDateTime < '2016-11-03 17:07:40.0709518'
)
INSERT INTO dbo.WorkQueue
(
WorkQueueId,
MessagePayload
)
SELECT WorkQueueId,
       MessagePayload
FROM LatestVersion
WHERE VersionNumber = 1
;

In this statement, we start with a query to retrieve all row versions from the dbo.WorkQueueHistory history table that have their validity period starting on or after the release time, and ending before the application went offline. This is the query within the LatestVersion CTE. In that query, we use the ROW_NUMBER() window function to number row versions for each PK value (i.e. for each WorkQueueId) in chronologically descending order, so that the latest version becomes version number one. In our specific example, there happens to be only one version, because only one DML statement (the erroneous DELETE) affected each row. In a more general case though (e.g. if a row was updated before having been deleted), multiple versions could exist, therefore we need to determine which version is the latest, which is achieved by numbering versions using ROW_NUMBER(). Then, we restrict this result set to filter out all versions but the latest, which is the version of the row just before it was erroneously deleted. Then, we insert these deleted rows back into the dbo.WorkQueue table, effectively recovering from the oops.

We can see that the unprocessed messages are back in the queue table:

SELECT WorkQueueId,
       MessagePayload
FROM dbo.WorkQueue
;

(3 row(s) affected)

There are two important caveats to note here.

If you are familiar with temporal tables, you may be wondering why we used the history table directly, instead of using one of the new FOR SYSTEM_TIME temporal clauses in the FROM clause of the query. The reason is that FOR SYSTEM_TIME filters out row versions with the same validity start and end times, as noted here. In our scenario, the INSERT and DELETE statements can happen so quickly one after the other, that the system clock resolution is insufficient for the timestamps to be different. In that case, had we used FOR SYSTEM_TIME, some of the row versions from the history table that we need for data recovery would be filtered out.

Careful readers may also notice that while the recovery statement above works as expected for the current example, it may be problematic in a more general case, specifically when multiple versions of a given row exist in the history table. That will happen if a row in the current table is updated at least once after having been inserted. In that case, there is no guarantee that the validity periods of multiple versions of the same row will be different. For example, if all changes happen in a transaction, then the validity period for every version created in the transaction will be the same, as documented here. This can also happen if, as mentioned earlier, the system clock does not change between very short DML statements. But in the above statement, we order row versions using the SysEndDateTime period column to determine the latest version! Therefore, if there is more than one row with the same SysEndDateTime, the result of the sort will be non-deterministic, and the row version inserted into the current table may not be the latest.

In most cases, there is a simple solution for this problem. Instead of using a period column to determine the latest row version, we can add a rowversion column to the current table, and use it for ordering versions in the history table. A rowversion column is automatically generated and incremented on each update, therefore the latest version of a row in the history table will have the highest rowversion value for the PK value of the row. We say “in most cases”, because today rowversion columns are not supported in memory-optimized tables.

Once the fixed code is deployed, and the application is working again, we turn off system versioning to avoid accumulating queue row versions in the database, until the next release:

ALTER TABLE dbo.WorkQueue SET (SYSTEM_VERSIONING = OFF);

In this simple example, we have shown how using a temporal table provides a simple way to recover from accidental data loss. The approach is also documented more generically in Repairing Row-Level Data Corruption. This is not by any means a universally applicable method; after all, for most databases, it would not be practical to make every table in the database a temporal table, not to mention that many data loss scenarios are much more complex than this example. However, if you have some tables containing critically important data, and/or data that is at high risk of accidental and erroneous changes, then using a temporal table provides a simple way to recover from an oops, keeping the database online, and avoiding a much heavier PITR process.

Reviewed by: Dimitri Furman,Sanjay Mishra, Mike Weiner

With the introduction of the Temporal feature in SQL 2016 and Azure SQL Database, there is an ability to time travel through the state of data as it was at any given point of time. Alongside In-Memory OLTP, Temporal on memory optimized tables allows you to harness the speed of In-Memory OLTP, and gives you the same ability to track history and audit every change made to a record. Temporal added to memory optimized tables also allows you to maintain a “smaller” memory optimized tables and thereby a smaller memory footprint by deleting data that isn’t “hot” anymore from the current memory optimized table, which in turn moves it to the history table without having an external archival process to do that.

When memory optimized and temporal tables are combined, an internal memory optimized table is created in addition to the history table, as depicted in the diagram below. Data is flushed asynchronously from the internal in-memory History table to the disk based history table. The flush interval isn’t currently configurable. Data is flushed when the internal table reaches 8% of the memory consumed by the current table, OR you can flush it manually by executing the procedure sys.sp_xtp_flush_temporal_history. The internal memory optimized table is created with the same column definitions as the current in-memory table, but with a single index.

Let’s walk through some code snippets to demonstrate this:

1. Create a new database and the Orders table which is our memory optimized “current” table


CREATE DATABASE [TestTemporalDb1];
ALTER DATABASE [TestTemporalDb1]
ADD FILEGROUP [IMOLTP] CONTAINS MEMORY_OPTIMIZED_DATA;
ALTER DATABASE [TestTemporalDb1]
ADD FILE (name='TestTemporalDb1_Mod1', FILENAME='D:\Temp\TestTemporalDb1_Mod1') TO FILEGROUP [IMOLTP];
USE [TestTemporalDb1]
GO
CREATE TABLE [dbo].[Orders](
[OrderId] [INT] Identity NOT NULL,
[StoreID] int NOT NULL,
[CustomerID] int NOT NULL,
[OrderDate] [datetime] NOT NULL,
[DeliveryDate] datetime NULL,
[Amount] float,
[Notes] [NVARCHAR] (max) NULL,
[ValidFrom] [datetime2](7) NOT NULL,
[ValidTo] [datetime2](7) NOT NULL,
CONSTRAINT [PK_OrderID] PRIMARY KEY NONCLUSTERED (OrderId)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
GO

2. Create the Orders_History table, which can be a rowstore, or columnstore, and have its own indexing scheme based on how you intend to query the history. If aggregations are being done on the history data, a columnstore index is likely the better choice. Given that the history table can grow large if many changes to the current table are made, it is usually a good idea to partition the history table. Partitioning is also recommended as you cannot directly delete from a history table outside of a partition swap without turning off system versioning. For more details: https://msdn.microsoft.com/en-us/library/mt637341.aspx . For the sake of simplicity, this example will not have partitioning implemented.


CREATE TABLE [dbo].[Orders_History](
[OrderId] [INT] NOT NULL,
[StoreID] int NOT NULL,
[CustomerID] int NOT NULL,
[OrderDate] [datetime] NOT NULL,
[DeliveryDate] datetime NULL,
[Amount] float,
[Notes] [NVARCHAR] (max) NULL,
[ValidFrom] [datetime2](7) NOT NULL,
[ValidTo] [datetime2](7) NOT NULL,
);
-- Create custom Indexing on the Temporal History table
CREATE CLUSTERED INDEX [IX_Order_History]
ON [dbo].[Orders_History] ( ValidTo, ValidFrom) WITH (DATA_COMPRESSION = PAGE)
-- Default Clustered index is on ValidFrom and ValidTo times
CREATE NONCLUSTERED INDEX [IX_OrderHistory_OrderId] ON [dbo].[Orders_History]
(
[OrderId] ASC,
ValidTo,
ValidFrom
) ;
GO

3. Turn on system versioning, defining the columns that are period columns, and the history table.

-- Make Temporal
ALTER TABLE [dbo].[Orders] ADD PERIOD FOR SYSTEM_TIME (ValidFrom,ValidTo);
GO
ALTER TABLE [dbo].[Orders] ALTER COLUMN ValidTo ADD HIDDEN;
ALTER TABLE [dbo].[Orders] ALTER COLUMN ValidFrom ADD HIDDEN;
GO
-- Enable Temporal
ALTER TABLE [dbo].[Orders]
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[Orders_History]))
GO

4. Note that until this point, we have not specifically defined any indexes on the memory optimized internal table. Let’s get the name of this internal table, to see which indexes are created by default.


SELECT SCHEMA_NAME ( T1.schema_id ) AS TemporalTableSchema
, T1.object_id AS TemporalTableObjectId
, OBJECT_NAME ( IT.parent_object_id ) AS ParentTemporalTableName
, IT.Name AS InternalHistoryStagingName
FROM sys.internal_tables IT
JOIN sys.tables T1 ON IT.parent_object_id = T1.object_id
WHERE T1.is_memory_optimized = 1 AND T1.temporal_type = 2

If we replace the Internal table name in the script below and look at its indexes, the results show that there is only one index created, with ValidTo and ValidFrom as first two columns. Currently there is no way to create any additional indexes on the internal memory optimized table when enabling system versioning. The single index has the following key columns: ValidTo,ValidFrom,{PKColumns},CHANGE_ID. CHANGE_ID is an additional column used to guarantee the uniqueness of rows in the internal history table.

sp_helpindex 'sys.memory_optimized_history_table_869578136'
GO

5. Why would we need any other indexes on the internal history table? Let’s say we have a rather large in-memory table as a current table. For every update or delete, a previous version of each affected row is added to the in-memory internal table. As mentioned earlier, this internal table is flushed to the disk based history table only when it consumes 8% of the size of the current table, or if flushed manually. As percentages go, 8% of a larger table, say a 20 million row table, is 1.6 million rows. As the current table grows, so does that internal table. There can be cases where the internal table is relatively large, before it is flushed. Given that it has only one index created with predefined key columns, if you are doing a point lookup on history table by anything other than the ValidTo column, this internal table would be scanned.

As a very simplistic example, I loaded 4.5 million rows into the Orders table, and then updated every 12^th row (full definition of the InsertOrders_Native_Batch stored procedure is in the appendix).

EXEC [InsertOrders_Native_Batch] 4500000;
-- Effectively updating every 1/18th row (less than 8% of rows)
SET NOCOUNT ON
GO
UPDATE Orders SET DeliveryDate = getdate()+1
where OrderID % 18 = 0;

6. Let’s now look at a most simple form of a point lookup query on the history table. Queries on the history table will return results from all the data that exists in the on-disk history table as well as the memory optimized internal history table.

-- Though a query is on the history table, it also references in-memory internal table
SELECT * FROM [Orders_History] WHERE OrderID = 18

SQL Server Execution Times: CPU time = 578 ms, elapsed time = 837 ms.

Note that in the plan above, even though we only get one row from the memory-optimized internal table, we are still doing a full scan, because in the only index this table has, OrderID is not the leading column. The memory optimized internal table has 250,000 rows, none of which are flushed to the on-disk history table yet, since it is below the flush threshold as shown by the snapshot of memory consumption below.

The same is true if you do a query that uses any of the FOR SYSTEM_TIME constructs, and restrict it to a particular OrderID


SELECT * FROM Orders FORSYSTEM_TIME ALL
WHERE [OrderID] = 18;

Solution

Starting in SQL 2016 CU3 or SQL 2016 SP1, we have introduced the ability to add additional indexes on the in-memory internal tables, under trace flag 10316. No additional DDL syntax exists from a temporal perspective at this point of time. Additional details of this are in the KB 3198846

Using this trace flag, let’s create an index on OrderID on the internal table

DBCC TRACEON(10316)
GO
ALTER TABLE sys.memory_optimized_history_table_565577053
ADD Index IndOrderID(OrderId)
GO
DBCC TRACEOFF(10316)

Looking at the plan for the same query now, we see an index seek, given there is an index to satisfy the predicate. Execution times are substantially lower than in the prior case

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 36 ms.

This not only allows us to be able to seek in this simplistic query case, but in more complex queries the downstream effects can result in significantly better plans and memory grants. This is another example of improvements being made based on actual customer workload feedback.

Appendix – TSQL Code:


CREATE DATABASE [TestTemporalDb1];
ALTER DATABASE [TestTemporalDb1] ADD FILEGROUP [IMOLTP] CONTAINS MEMORY_OPTIMIZED_DATA ;
ALTER DATABASE [TestTemporalDb1]
ADD FILE (name='TestTemporalDb1_Mod1', filename='H:\data\TestTemporalDb1_Mod1') TO FILEGROUP [IMOLTP];
USE [TestTemporalDb1]
GO
-- Create on In-memory table with a history table.
CREATE TABLE [dbo].[Orders](
[OrderId] [INT] Identity NOT NULL,
[StoreID] int NOT NULL,
[CustomerID] int NOT NULL,
[OrderDate] [datetime] NOT NULL,
[DeliveryDate] datetime NULL,
[Amount] float,
[Notes] [NVARCHAR] (max) NULL,
[ValidFrom] [datetime2](7) NOT NULL,
[ValidTo] [datetime2](7) NOT NULL,
CONSTRAINT [PK_OrderID] PRIMARY KEY NONCLUSTERED (OrderId)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
GO
ALTER TABLE Orders ADD INDEX IndOrders_StoreID(StoreID);
GO

-- Create the table on the partition scheme
CREATE TABLE [dbo].[Orders_History](
[OrderId] [INT] NOT NULL,
[StoreID] int NOT NULL,
[CustomerID] int NOT NULL,
[OrderDate] [datetime] NOT NULL,
[DeliveryDate] datetime NULL,
[Amount] float,
[Notes] [NVARCHAR] (max) NULL,
[ValidFrom] [datetime2](7) NOT NULL,
[ValidTo] [datetime2](7) NOT NULL,
)
GO

-- Create custom Indexing on the Temporal History table
CREATE CLUSTERED INDEX [IX_Order_History]
ON [dbo].[Orders_History] ( ValidTo, ValidFrom) WITH (DATA_COMPRESSION = PAGE);
-- Default Clustered index is on ValidFrom and ValidTo times
CREATE NONCLUSTERED INDEX [IX_OrderHistory_OrderId] ON [dbo].[Orders_History]
( [OrderId] ASC, ValidTo, ValidFrom) ;
GO

-- Make Temporal
ALTER TABLE [dbo].[Orders] ADD PERIOD FOR SYSTEM_TIME (ValidFrom,ValidTo);
ALTER TABLE [dbo].[Orders] ALTER COLUMN ValidTo ADD HIDDEN;
ALTER TABLE [dbo].[Orders] ALTER COLUMN ValidFrom ADD HIDDEN;
GO
-- Enable Temporal
ALTER TABLE [dbo].[Orders] SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[Orders_History]));
GO

-- Internal Tables and memory consumption of each?
WITH InMemoryTemporalTables
AS
(
SELECT SCHEMA_NAME ( T1.schema_id ) AS TemporalTableSchema
, T1.object_id AS TemporalTableObjectId
, IT.object_id AS InternalTableObjectId
, OBJECT_NAME ( IT.parent_object_id ) AS ParentTemporalTableName
, IT.Name AS InternalHistoryStagingName
FROM sys.internal_tables IT
JOIN sys.tables T1 ON IT.parent_object_id = T1.object_id
WHERE T1.is_memory_optimized = 1 AND T1.temporal_type = 2
)
, DetailedConsumption
AS
(
SELECT TemporalTableSchema
, T.ParentTemporalTableName
, T.InternalHistoryStagingName
, CASE
WHEN C.object_id = T.TemporalTableObjectId
THEN 'Temporal Table Consumption'
ELSE 'Internal Table Consumption'
END ConsumedBy
, C.*
FROM sys.dm_db_xtp_memory_consumers C
JOIN InMemoryTemporalTables T
ON C.object_id = T.TemporalTableObjectId OR C.object_id = T.InternalTableObjectId
)
--select * from DetailedConsumption
SELECT TemporalTableSchema,
ParentTemporalTableName, object_id, object_name(object_id) as MemoryUsedByTable
, sum ( allocated_bytes ) AS allocated_bytes
, sum ( used_bytes ) AS used_bytes
FROM DetailedConsumption
GROUP BY TemporalTableSchema, ParentTemporalTableName, InternalHistoryStagingName,object_id ;

-- Check indexes
-- Replace the internal table with the table name you get from the prior query
sp_helpindex 'Orders'
GO
sp_helpindex 'sys.memory_optimized_history_table_565577053'
GO

-- Insert some data?
CREATE PROCEDURE [dbo].[InsertOrders_Native_Batch]
@OrderNum INT=100
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
DECLARE @counter AS INT = 1;
WHILE @counter <= @OrderNum
BEGIN
INSERT INTO dbo.Orders
VALUES (@counter % 8 , @counter %10000 , getdate(), NULL , rand() * 1000,
'Notes for a fake order temporal testing, will update these later' );
SET @counter = @counter + 1;
END
END;
GO

-- Insert some rows
EXEC [InsertOrders_Native_Batch] 500000
Go 9

-- Now update them.
-- Effectively updating every 1/18th row ( less than 10% of rows)
SET NOCOUNT ON
GO
UPDATE Orders SET DeliveryDate = getdate()+1
WHERE OrderID % 18 = 0

-- This also references In-memory temp table
select count(*) from [Orders_History]

-- Enable actual plan and also look at Statistics
set statistics time on
go
-- Though a query on the history table, it also references In-memory temp table
select * from [Orders_History] where OrderID = 18

--*******************************************************************************
-- You need SQL 2016 CU3 or SQL 2016 SP1 in order for the steps below to work
--**********************************************************************************
DBCC TRACEON(10316)
GO
ALTER TABLE sys.memory_optimized_history_table_565577053
ADD Index IndOrderID(OrderId)
GO
DBCC TRACEOFF(10316)

-- Now execute the same statement and you should see a seek
select * from [Orders_History] where OrderID = 18

-- Can we manually ensure that the in-memory history table is flushed?
exec sys.sp_xtp_flush_temporal_history 'dbo', 'Inventory'

Reviewers: Dimitri Furman, John Hoang, Mike Weiner, Denzil Ribeiro, Joe Yong

Background:

Azure SQL Data Warehouse service (SQL DW) uses a snapshot backup to back up your data at a regular (8 hour) interval. DBAs can use this backup to restore a SQL DW database into a new database in the same region, or to a paired region at a different geographical location. For details refer to this documentation. As an example, if you have an Azure SQL DW in West US, your paired region is East US. You can learn about the Region pairs here.

While you can use the backup to restore to a paired region, what about the scenario of moving your SQL DW database to a remote non-paired region, as an example, let’s say East Asia? You can export the data out to the blob storage in a region of your choice from SQL DW, and then reload it into the new database in East Asia. This is very cumbersome if you have hundreds of tables, and could also be error prone. You will also have to re-create all the database objects in the new database.

Is there an easier way? I am going to talk about a little known, but easy solution that can help you do just that in the next section.

Solution:

The “Create new SQL Data Warehouse” workflow allows you to create a new database from a backup. Let’s explore if it allows me to create a SQL DW database from backup to a region of your choice!

1. In Azure Portal, click on the + sign and choose Databases, then SQL Data Warehouse.

2. Enter a database name.
3. Create a new resource group or use an existing one.
4. From Select Source choose “Backup”. This will take you to a blade where you can see all your other SQL DW databases. Choose the one that you want to migrate to another region.

5. On the Server blade, create a new one in a location of your choice.

6. Back on the SQL Data Warehouse blade, click on Create.
7. Wait for it to get done.

How fast the backup is restored to a different region will depend on data size. The bigger the database, the longer it will take. As an example, a 1.7TB database was restored using the above method in 30 minutes.

Conclusion

For this process to work, your source SQL DW database must exist. It won’t work for a deleted SQL DW database. Once the new SQL DW database comes online, you can login and run some deterministic queries to make sure you have the exact data. While this restore is going on, if you are loading data to the source SQL DW, you may want to run your differential loads to the newly restored database and reconcile the differences since the restore. Once satisfied, you have the choice to keep both the SQL DW up and running or pause the source SQL DW database to save on cost, ultimately dropping the database at a future date.

Reviewed by: Mike Weiner, Murshed Zaman

A fundamental part of ensuring application resiliency to failures is being able to tell if the application database(s) are available at any given point in time. Synthetic monitoring is the method often used to implement an overall application health check, which includes a database availability check. A synthetic application transaction, if implemented properly, will test the functionality, availability, and performance of all components of the application stack. The topic of this post, however, is relatively narrow: we are focused on checking database availability specifically, leaving the detection of functional and performance issues out of scope.

Customers who are new to implementing synthetic monitoring may choose to check database availability simply by attempting to open a connection to the database, on the assumption that the database is available if the connection can be opened successfully. However, this is not a fully reliable method – there are many scenarios where a connection can be opened successfully, yet be unusable for the application workload, rendering the database effectively unavailable. For example, the SQL Server instance may be severely resource constrained, required database objects and/or permissions may be missing, etc.

An improvement over simply opening a connection is actually executing a query against the database. However, a common pitfall with this approach is that a read (SELECT) query is used. This may initially sound like a good idea – after all, we do not want to change the state of the database just because we are running a synthetic transaction to check database availability. However, a read query does not detect a large class of availability issues; specifically, it does not tell us whether the database is writeable. A database can be readable, but not writeable for many reasons, including being out of disk space, having incorrectly connected to a read-only replica, using a storage subsystem that went offline but appears to be online due to reads from cache, etc. In all those cases, a read query would succeed, yet the database would be at least partially unavailable.

Therefore, a robust synthetic transaction to check database availability must include both a read and a write. To ensure that the storage subsystem is available, the write must not be cached, and must be written through to storage. As a DBMS implementing ACID properties, SQL Server guarantees that any write transaction is durable, i.e. that the data is fully persisted (written through) to storage when the transaction is committed. There is, however, an important exception to this rule. Starting with SQL Server 2014 (and applicable to Azure SQL Database as well), there is an option to enable delayed transaction durability, either at the transaction level, or at the database level. Delayed durability can improve transaction throughput by not writing to the transaction log while committing every transaction. Transactions are written to log eventually, in batches. This option effectively trades off data durability for performance, and may be useful in contexts where a durability guarantee is not required, e.g. when processing transient data available elsewhere in case of a crash.

This means that in the context of database availability check, we need to ensure that the transaction actually completes a write in the storage subsystem, whether or not delayed durability is enabled. SQL Server provides exactly that functionality in the form of sys.sp_flush_log stored procedure.

As an example that puts it all together, below is sample code to implement a database availability check.

First, as a one-time operation, we create a helper table named AvailabilityCheck (constrained to have at most one row), and a stored procedure named spCheckDbAvailability.

CREATE TABLE dbo.AvailabilityCheck
(
AvailabilityIndicator bit NOT NULL CONSTRAINT DF_AvailabilityCheck_AvailabilityIndicator DEFAULT (1),
CONSTRAINT PK_AvailabilityCheck PRIMARY KEY (AvailabilityIndicator),
CONSTRAINT CK_AvailabilityCheck_AvailabilityIndicator CHECK (AvailabilityIndicator = 1),
);
GO

CREATE PROCEDURE dbo.spCheckDbAvailability
AS
SET XACT_ABORT, NOCOUNT ON;

BEGIN TRANSACTION;

INSERT INTO dbo.AvailabilityCheck (AvailabilityIndicator)
DEFAULT VALUES;

EXEC sys.sp_flush_log;

SELECT AvailabilityIndicator
FROM dbo.AvailabilityCheck;

ROLLBACK;

To check the availability of the database, the application executes the spCheckDbAvailability stored procedure. This starts a transaction, inserts a row into the AvailabilityCheck table, flushes the data to the transaction log to ensure that the write is persisted to disk even if delayed durability is enabled, explicitly reads the inserted row, and then rolls back the transaction, to avoid accumulating unnecessary synthetic transaction data in the database. The database is available if the stored procedure completes successfully, and returns a single row with the value 1 in the single column.

Note that an execution of sp_flush_log procedure is scoped to the entire database. Executing this stored procedure will flush log buffers for all sessions that are currently writing to the database and have uncommitted transactions, or are running with delayed durability enabled and have committed transactions not yet flushed to storage. The assumption here is that the availability check is executed relatively infrequently, e.g. every 30-60 seconds, therefore the potential performance impact from an occasional extra log flush is minimal.

As a test, we created a new database, and placed its data and log files on a removable USB drive (not a good idea for anything other than a test). For the initial test, we created the table and the stored procedure as they appear in the code above, but with the call to sp_flush_log commented out. Then we pulled out the USB drive, and executed the stored procedure. It completed successfully and returned 1, even though the storage subsystem was actually offline.

For the next test (after plugging the drive back in and making the database available), we altered the procedure to include the sp_flush_log call, pulled out the drive, and executed the procedure. As expected, it failed right away with the following errors:

Msg 9001, Level 21, State 4, Procedure sp_flush_log, Line 1 [Batch Start Line 26]
The log for database 'DB1' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
Msg 9001, Level 21, State 5, Line 27
The log for database 'DB1' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
Msg 3314, Level 21, State 3, Line 27
During undoing of a logged operation in database 'DB1', an error occurred at log record ID (34:389:6). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
Msg 3314, Level 21, State 5, Line 27
During undoing of a logged operation in database 'DB1', an error occurred at log record ID (34:389:5). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
Msg 596, Level 21, State 1, Line 26
Cannot continue the execution because the session is in the kill state.
Msg 0, Level 20, State 0, Line 26
A severe error occurred on the current command. The results, if any, should be discarded.

To summarize, we described several commonly used ways to implement database availability check from the application tier, and shown why some of these approaches are not fully reliable. We then described a more comprehensive check, and provided sample implementation code.

Written by: Josh Gnanayutham, Program Manager, SQL Engineering

Introduction

As users are increasingly moving their data to the Azure cloud, migration from SQL Server to Azure SQL Database is a common task. There are many migration methods and they each have their pros and cons. This blog post will explore how to migrate your database using Transaction Replication. It will also cover the limitations.

As a prerequisite to this article, we recommend looking at the Azure SQL Database documentation on migration, which summarizes the different options you have. This will help determine if Transactional Replication is a good option for you. Keep in mind that you should refer to the Azure SQL Database documentation for the most up to date information.

Transactional replication is useful for migration when

Minimal downtime is required
Unsupported features are not used
Using a supported version of SQL Server

The following are the major tasks associated with migration.

Before migration, ensure that the database is compatible with Azure SQL Database. Not all SQL Server features are supported in Azure SQL Database.
Provision and Configure Azure Resources.
Rehearse migration steps in a test environment to ensure the migration will go smoothly.
Test the migrated Azure SQL Database to see if it performs as expected.
Operationalize migrated database.

This document will focus on step 3, and the aspects of migration which are unique to Transactional Replication. We recommend you use the steps from the blog post on Migrating from SQL Server to Azure SQL Database using Bacpac Files to prepare your database for migration. It will guide you through database compatibility and the provisioning and configuration of Azure resources (Steps 1 and 2). Note that regarding compatibility, Transactional Replication is a little bit more flexible than migration with bacpac files. For a bacpac to be used, the entire database must be compatible with Azure SQL Database, and not contain any broken object references. With Transaction Replication, you can omit incompatible or broken parts of the database if they are unnecessary. This can be done when you define your publication.

About Transactional Replication

Transactional replication involves three main components. They are the publisher, the distributor, and the subscriber. Transactional replication starts with a snapshot of the original database. After the initial snapshot is created, all changes to published objects and data in the original database (the publisher) are propagated to the new database (the subscriber) by the distributor, guaranteeing transactional consistency.

With transactional replication, you will suffer little to no downtime, assuming you’re using concurrent snapshots. With concurrent snapshots, you can continue using your original database while the snapshot is being created. After this, transactional replication will keep the subscriber up to date with minimal latency, so you can switch to using your new database in the cloud whenever you want. Note that in the case of highly intensive workloads, downtime may still be advised for snapshot creation, in order to prevent resource contention from affecting the application.

There are some features that Transactional Replication does not support when the subscriber is in Azure SQL Database. If you are using any unsupported features, Transactional Replication may not work. For more details on Transactional Replication, you can look at the full documentation. documentation.

Migrate the Database

After you’ve determined that your database is compatible with Azure SQL Database and that Transactional Replication fits your needs, you can begin migration.

The basic migration tasks are as follows:

Set up distribution
Create publication
Create subscription

In the following sections, we’ll walk through each of these steps in more detail.

Set Up Distribution

The distributor is responsible for controlling the processes which move your data between servers. When you set up distribution, SQL will create a distribution database. Each publisher needs to be tied to a distribution database. The distribution database holds the metadata for each associated publication and data on the progress of each replication. For transactional replication, it will hold all the transactions than need to be executed on the subscriber.

To set up distribution you will:

Configure Distribution
Select snapshot folder
Grant publisher access to the distributor server

Using SQL Server Management Studio (SSMS)

Connect to server you are replicating in Object Explorer
Right click on the Replication folder and select Configure Distribution
On the Distributor page, select “Server Name” will act as its own distributor. Then click next. Note that using the publisher as its own distributor may cause a performance impact on the server, depending on the amount of data you’re replicating and on the server resource headroom. If the performance impact is unacceptable, you can use a remote distributor, but it will add complexity to management and administration. The distributor must have network access to your Azure SQL Database. This usually means allowing outbound internet access on the distributor. However, if you have an ExpressRoute link to the target Azure region, internet access is not necessary.
If the SQL Agent isn’t running on the SQL Agent Start page, select Yes to configure the SQL Agent to run automatically.
Select a Snapshot folder to store your initial snapshot. Creating a snapshot involves taking a BCP copy of every replicated table. Make sure the location you choose has enough space for this. By default, snapshot data is uncompressed, even if you use data compression in the database. While using compressed snapshots is possible, that carries significant limitations.
Use the defaults on the remaining pages of the wizard.
Click Finish to enable the distribution.
After this you’ll have to give the publisher access to the distributor. If you are using a remote distributor located on a different server from that of your publisher, you’ll need to set up a password.

Using Transact-SQL

Execute sp_get_distributor to determine if the server is already configured as a Distributor. If the value of installed in the result set is 0, execute sp_adddistributor at the Distributor on the master database. Specify the name of the distribution database for @database.
At the Distributor, which is also the Publisher if you’re using local distribution, execute sp_adddistpublisher, specifying the folder that will be used as default snapshot folder for @working_directory. If you are using a remote distributor, the steps will be a little different. Note that the distribution server must have network access to your Azure SQL Database.
At the Publisher, execute sp_replicationdboption. Specify the database being published for @dbname, the type of replication for @optname (publish), and a value of true for @value.

More Details

For more details about configuring Distribution, go here.

Create Publication

The publisher is the database where all data for migration originates. Within the publisher, there can be many publications, though in the context of migration to Azure SQL Database, only one publication is typically used. These publications contain articles which map to database objects, including tables, that need to be replicated. Depending on how you define the publication and articles, you can replicate either all or a part of your database. Note that for each table, it is possible to replicate just a subset of rows by defining a filter for the corresponding article.

Using SQL Server Management Studio (SSMS)

Connect to the Publisher in SSMS, and expand the server node.
Expand the Replication folder and right-click the Local Publications folder.
Click New Publication. If your server is not configured as a publisher, you will be prompted to do that.
Select your publication database, then click next.
Select Transactional Publication, then click next.
Select the articles you want to publish, then click next. You can publish everything, or select specific tables.
You have the option to filter rows, this step is not necessary, click next. You can use this to filter out unnecessary data.
Select when you would like to create a snapshot. In this blog, we will select Immediately for the sake of simplicity. Then click next. The snapshot is for the initial synchronization between the publisher and subscriber. You can also schedule your snapshot creation for later.
Click Security Settings near the Snapshot Agent box. Select Run under the following Windows account and enter your credentials. Under Connect to the Publisher, select Impersonate. Click ok to confirm and return to the original Agent Security page. Check the checkbox at the bottom and click next.
Select Create the publication and then click next
Name the publication and you’re done.

Using Transact-SQL

Execute sp_replicationdboption (Transact-SQL) to enable publication of the current database using transactional replication.
Determine whether a Log Reader Agent job exists for the publication database. If a Log Reader Agent job exists for the publication database, proceed to step 3. If you are unsure whether a Log Reader Agent job exists for a published database, execute sp_helplogreader_agent (Transact-SQL) at the Publisher on the publication database. If the result set is empty, create a Log Reader Agent job. At the Publisher, execute sp_addlogreader_agent (Transact-SQL). Specify the Microsoft Windows credentials under which the agent runs for @job_name and @password. If the agent will use SQL Server Authentication when connecting to the Publisher, you must also specify a value of 0 for @publisher_security_mode and the Microsoft SQL Server login information for @publisher_login and @publisher_password.
Execute sp_addpublication (Transact-SQL). Specify a publication name for @publication, and, for the @repl_freq parameter, specify a value of continuous for a transactional publication.
Execute sp_addpublication_snapshot (Transact-SQL). Specify the publication name used in step 3 for @publication and the Windows credentials under which the Snapshot Agent runs for @snapshot_job_name and @password. This creates a Snapshot Agent job for the publication. When configuring a Publisher with a remote Distributor, the values supplied for all parameters, including job_login and job_password, are sent to the Distributor as plain text. You should encrypt the connection between the Publisher and its remote Distributor before executing this stored procedure.
Add articles to the publication. For more information, see Define an Article.
Start the Snapshot Agent job to generate the initial snapshot for this publication. For more information, see Create and Apply the Initial Snapshot.

More Details

You can see more details on creating your Publication here.

Create Subscription

In a replication topology, the subscriber is the server which receives data and transactions from the publication. Each publication can have many subscriptions, though in the context of migration to Azure SQL Database, only one subscription is typically used.

Using SQL Server Management Studio (SSMS)

Connect to the Publisher in SSMS and expand the server node.
Expand the Replication folder, and then expand the Local Publications folder.
Right-click your publication and click New Subscriptions.
Select your publication and click next. Select Run at Distributor and click next.
Only push subscriptions are supported for Azure SQL Database.
Click Add Subscriber and connect to the Azure SQL Database logical server you are migrating to.
Select the Subscription Database, this is where the data will be replicated. Note that this database is expected to have been created in advance, with an appropriate edition and service level. Then click next.
Click … and select the option to connect to the subscriber using SQL Server login. Enter the credentials. Connect to the Distributor by Impersonating. Then click next. In the context of replication, you can only connect to Azure SQL Database using SQL Server Authentication.
Use the defaults on the remaining pages of the wizard.
Click Finish to create the subscription.

Using Transact-SQL

Do the following at the Publisher on the publication database.
Execute sp_helppublication to see if push subscriptions are enabled. If the value of allow_push is 1, push subscriptions are supported. If the value of allow_push is 0, execute sp_changepublication, specifying allow_push for @property and true for @value.
Execute sp_addsubscription. Specify the @publication, @subscriber and @destination_db. Specify a value of push for @subscription_type.
Execute sp_addpushsubscription_agent. Specify the @subscriber, @subscriber_db, and @publication parameters. The SQL Server credentials under which the Distribution Agent at the Distributor runs for @job_login and @job_password. When creating a push subscription at a Publisher with a remote Distributor, the values supplied for all parameters, including job_login and job_password, are sent to the Distributor as plain text. You should encrypt the connection between the Publisher and its remote Distributor before executing this stored procedure. For more information, see Enable Encrypted Connections to the Database Engine (SQL Server Configuration Manager).

More Details

You can see more details on setting up your Subscription here.

After Migration

After migration you have a few more things to do:

Verify successful migration
End Replication

End Replication

The easiest way to end replication is to simply delete the publication. When you delete the publication all subscriptions are automatically deleted.

Using SQL Server Management Studio (SSMS)

Connect to the Publisher in SSMS and expand the server node.
Expand the Replication folder, and then expand the Local Publications folder.
Right-click your publication and click Delete.
Click Yes.

Using Transact-SQL

Do the following at the Publisher on the publication database.
Execute sp_droppublication at the Publisher on the publication database. Specify the @publicationDB and @publication parameters.

Verify Successful Migration

After migration is complete, verification is a vital step. You should ensure that your data was correctly and completely migrated before you start using your new database. See the verification section of this blog for some quick sample queries to help you verify success. To be more thorough you can use data compare in SSDT, but this will be time consuming.

Limitations

There are some limitations to when transactional replication can be used for migration. For complete documentation go here. The following configurations are supported:

Only push subscriptions are supported.
The distribution and replication agents cannot be on Azure SQL Database.
Only one-way transactional replication is supported. Peer-to-peer, bi-directional and merge are not supported.
SQL Server 2012 or later

Conclusion

In this blog post, we covered migration from on premises SQL Server to Azure SQL Database using transactional replication. This is a common migration scenario, especially in cases where minimal downtime is required. This blog post will be useful to organizations preparing to migrate to Azure SQL Database.

Reviewed by: Steven Schneider, Sanjay Mishra, Mike Weiner, Kun Cheng, Dimitri Furman, Arvind Shyamsundar, Mahesh Unnikrishnan, Luis Carlos Vargas Herring

Deploying SQL Server Availability Groups in Azure VMs typically involved provisioning two additional VMs to host an Active Directory Domain Controller. With Azure Active Directory (AAD) Domain Services (aka managed domain services on Azure), it is possible to deploy SQL Server Availability Groups (AG) in Azure, instead of deploying VMs for the AD. AAD Domain Services provides services such as domain join, group policy, LDAP, Kerberos/NTLM authentication that are fully compatible with Windows Server Active Directory and the best part is, it’s completely managed by Azure. With AAD Domain Services there is no need for you to deploy, manage, and patch domain controllers in the cloud. Our earlier Azure AG deployment documents required you to provision two domain controller VM’s (primary and secondary) but with AAD Domain Services you just have to ask Azure to do it for you. The overhead of managing the domain controller VM’s is gone with AAD Domain Services.

The current pre requisite is that AAD Domain services can be enabled only in Classic Virtual Network. We know that you might have started with a classic virtual network in the past and all new virtual networks you create will be mostly the recommended Azure Resource Manager (ARM) based. Good news is that VM’s in ARM Virtual Network can leverage AAD Domain Services enabled in Classic Virtual once these two networks are connected through VNET Peering or VNET to VNET VPN gateway . We went with VNET Peering for this post and AG setup was seamless. For detailed network considerations with AAD Domain services check this link.

SQL Server Always On AG scenarios with AAD Domain Services

We covered following two scenarios for this post

Scenario 1:
Enabling AAD Domain Services in Classic Virtual Network and deploying two SQL Server 2016 Classic VM’s (Windows Server 2012) and then setting up AG

Scenario 2:
Leveraging AAD Domain Services enabled in Classic Virtual Network from an ARM virtual network by adding an ARM based SQL Server 2016 VM as replica to the existing AG.

Detailed steps and screenshots

Steps 1-4 cover Scenario 1
1. We enabled AAD Domain Services on a classic virtual network
You can start with the excellent documentation from here and follow along Task 1 to Task 5

As displayed below, VNETAADDSTEST is the name of my Classic Virtual Network

In the below screenshot CONTOSO100.onmicrosoft.com is the DNS name of managed domain. Note that the DNS server IP’s in above screenshot are the actual primary and secondary IP of managed domain.

2. Next, we deployed two SQL Server 2016 Classic VM’s from Azure gallery images in a cloud service:

3. Then we added the VMs to managed domain (CONTOSO100.onmicrosoft.com).
Here is a sample “Welcome to the domain” screenshot for SQL2016VM1:

4. Then we configured SQL Server Always On AG successfully, by following detailed steps documented here
Displayed below is the screenshot showing an healthy Always On dashboard

Steps 5-10 cover Scenario 2
5. Then, we created an ARM Virtual Network called ARMVnetAADDSTest.
Displayed below is a screenshot from Azure Portal showing create virtual network configuration pane.

6. We then peered it with the classic virtual network that we created earlier. Go to Azure Portal | Virtual Network | ARMVnetAADDSTest | under Settings, click Peerings | Click Add
This screenshot shows the Add Peering configuration pane.

7. We then updated the DNS settings (highlighted below) for the ARM Virtual Network to point to AAD Domain Services IP’s

8. We added a third SQL Server 2016 VM (“ARMSQL2016VM3”) in the ARM virtual network

9. Then we added the third ARM VM to managed domain
The screenshot shows “Welcome to the domain” message for the third VM “ARMSQL2016VM3”

10. Finally we added the third VM to Windows Server Failover Clustering (WSFC) and to the existing AG and it just worked (Given our careful planning and execution of the previous steps, it is no surprise that it just works!)
The screenshot shows a healthy AG dashboard after adding the third VM as replica.

Points worth mentioning

1. If you have existing AAD users and a Cloud-only or Synced tenant, please follow the recommendation here (Task 5)
2. We created an user (named it SQLInstall) and added it to “AAD DC Administrators” group and used this user credentials to add VM’s into managed domain. “AAD DC Administrators” group has all the permissions needed to join a VM to domain.
3. When we created the ARM virtual network, we made sure the subnet address does not overlap with the existing one in classic virtual network, this is something you must be careful as VNET peering feature will not get enabled if there are overlapping IP address spaces.

We have plans in near future to enable AAD Domain Services in ARM Virtual Network. In summary, this post was to showcase two scenarios that would benefit from AAD Domain Services. So, go ahead and enjoy this integrated functionality and let us know if you have any questions or feedback.

Reviewed by: Pat Schaefer, Rajesh Setlem, Xiaochen Wu, Murshed Zaman

All SQL Server versions starting from SQL Server 2012 SP1 CU2 support Backup to URL, which allows storing SQL Server backups in Azure Blob Storage. In SQL Server 2016, several improvements to Backup to URL were made, including the ability to use block blobs in addition to page blobs, and the ability to create striped backups (if using block blobs). Prior to SQL Server 2016, the maximum backup size was limited to the maximum size of a single page blob, which is 1 TB.

With striped backups to URL in SQL Server 2016, the maximum backup size can be much larger. Each block blob can grow up to 195 GB; with 64 backup devices, which is the maximum that SQL Server supports, that allows backup sizes of 195 GB * 64 = 12.19 TB.

(As an aside, the latest version of the Blob Storage REST API allows block blob sizes up to 4.75 TB, as opposed to 195 GB in the previous version of the API. However, SQL Server does not use the latest API yet.)

In a recent customer engagement, we had to back up a 4.5 TB SQL Server 2016 database to Azure Blob Storage. Backup compression was enabled, and even with a modest compression ratio of 30%, 20 stripes that we used would have been more than enough to stay within the limit of 195 GB per blob.

Unexpectedly, our initial backup attempt failed. In the SQL Server error log, the following error was logged:

Write to backup block blob device https://storageaccount.blob.core.windows.net/backup/DB_part14.bak failed. Device has reached its limit of allowed blocks.

When we looked at the blob sizes in the backup storage container (any storage explorer tool can be used, e.g. Azure Storage Explorer), the blob referenced in the error message was slightly over 48 GB in size, which is about four times smaller than the maximum blob size of 195 GB that Backup to URL can create.

To understand what was going on, it was helpful to re-read the “About Block Blobs” section of the documentation. To quote the relevant part: “Each block can be a different size, up to a maximum of 100 MB (4 MB for requests using REST versions before 2016-05-31 [which is what SQL Server is using]), and a block blob can include up to 50,000 blocks.”

If we take the error message literally, and there is no reason why we shouldn’t, we must conclude that the referenced blob has used all 50,000 blocks. That would mean that the size of each block is 1 MB (~48 GB / 50000), not the maximum of 4 MB that SQL Server could have used with the version of REST API it currently supports.

How can we make SQL Server use larger block sizes, specifically 4 MB blocks? Fortunately, this is as simple as using the MAXTRANSFERSIZE parameter in the BACKUP DATABASE statement. For 4 MB blocks, we used the following statement:

BACKUP DATABASE … TO

URL = 'https://storageaccount.blob.core.windows.net/backup/DB_part01.bak',

…

URL = 'https://storageaccount.blob.core.windows.net/backup/DB_part20.bak',

WITH CHECKSUM, FORMAT, STATS = 5, COMPRESSION, MAXTRANSFERSIZE = 4194304;

This time, backup completed successfully, with each one of the 20 blobs/stripes being slightly less than 100 GB in size. We could have used fewer stripes to get closer to 195 GB blobs, but since the compression ratio was unknown in advance, we chose a safe number.

Conclusion: If using Backup to URL to create striped backups of large databases (over 48 GB per stripe), specify MAXTRANSFERSIZE = 4194304 in the BACKUP DATABASE statement.

Sanjay Mishra

Reviewed By: Dimitri Furman, Murshed Zaman, Kun Cheng

If you have tried to use BULK INSERT or bcp utilities to load UTF-8 data into a table in SQL Server 2014 or in an earlier release (SQL Server 2008 or later), you have likely received the following error message:

Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.

The requirement to support UTF-8 data for these utilities has been extensively discussed on various forums, most notably on Connect.

This requirement has been addressed in SQL Server 2016 (and backported to SQL Server 2014 SP2). To test this, I obtained a UTF-8 dataset from http://www.columbia.edu/~fdc/utf8/. The dataset is translation of the sentence “I can eat glass and it doesn’t hurt me” in several languages. A few lines of sample data are shown here:

(As an aside, it is entirely possible to load Unicode text such as above into SQL Server even without this improvement, as long as the source text file uses a Unicode encoding other than UTF-8.)

-- SQL Server 2014 SP1 or earlier

CREATE DATABASE DemoUTF8_2014
GO

USE DemoUTF8_2014
GO

CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO

BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO

Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.

-- SQL Server 2016 RTM or SQL Server 2014 SP2 or later

CREATE DATABASE DemoUTF8_2016
GO

USE DemoUTF8_2016
GO

CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO

BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO

(150 row(s) affected)
SELECT * FROM Newdata
GO

You can now use CODEPAGE=’65001′ with BULK INSERT, bcp and OPENROWSET utilities.

Note that this improvement is only scoped to input processing by bulk load utilities. Internally, SQL Server still uses the UCS-2 encoding when storing Unicode strings.

Reviewed by: Pat Schaefer, Rajesh Setlem, Xiaochen Wu, Murshed Zaman

Unexpectedly, our initial backup attempt failed. In the SQL Server error log, the following error was logged:

Write to backup block blob device https://storageaccount.blob.core.windows.net/backup/DB_part14.bak failed. Device has reached its limit of allowed blocks.

BACKUP DATABASE … TO
URL = 'https://storageaccount.blob.core.windows.net/backup/DB_part01.bak',
…
URL = 'https://storageaccount.blob.core.windows.net/backup/DB_part20.bak',
WITH COMPRESSION, MAXTRANSFERSIZE = 4194304, BLOCKSIZE = 65536, CHECKSUM, FORMAT, STATS = 5;

Another backup parameter that helps squeeze more data within the limited number of blob blocks is BLOCKSIZE. With the maximum BLOCKSIZE of 65536, the likelihood that some blob blocks will be less than 4 MB is reduced.

Conclusion: If using Backup to URL to create striped backups of large databases (over 48 GB per stripe), specify MAXTRANSFERSIZE = 4194304 and BLOCKSIZE = 65536 in the BACKUP statement.

The widely popular SQLBits conference will take place April 5-8 2017 in Telford, UK. Several SQLCAT team members will be presenting a number of breakout sessions. Our topics this year are SQL Server on Linux and Azure SQL Database.

SQLCAT sessions are unique. We bring in real customer stories, and present their deployments, architectures, challenges, and lessons learned. This year, we will have four sessions at SQLBits:

Azure SQL Database best practices in resiliency design, Friday April 7, 11:00.

HADR for SQL Server on Linux, Friday April 7, 15:30.

Migrating from SQL Server to Azure SQL Database, Friday April 8, 17:00.

Troubleshooting common DBA scenarios with SQL on Linux, Saturday April 8, 12:30.

In addition to the sessions, you can also find us at the Microsoft stand, and at the “Data Clinic” event. Data Clinic is new for SQLBits this year. If you have a technical question, a troubleshooting challenge, want to have an architecture discussion, or want to find best ways to upgrade, the Microsoft Data Clinic is the place you want to be at. The Data Clinic is the hub of technical experts from SQLCAT, Tiger team, Product Group, Customer Support Services (CSS) and others. Whether you want a facelift of your data solution or an open heart surgery, the experts at Data Clinic will have the right advice for you.

We hope that you can attend, and are looking forward to seeing you all in Telford in less than a week!

Contributions from, and reviewed by: Ken Van Hyning, David Shiflet, Charles Gagnon and Alan Ren (SSMS dev team), Dimitri Furman, Mike Weiner and Rajesh Setlem (SQLCAT)

Background

SQL Server Management Studio (SSMS) is the most popular client used to administer and work with SQL Server and Azure SQL DB. Internally, the SSMS code uses the SqlClient class (implemented in the .NET Framework) to connect to SQL Server. Recent versions of SSMS have been compiled with .NET Framework 4.6.1. What this means is that SSMS gets to leverage many of the newer capabilities in the .NET Framework (for example, the changes for Always Encrypted). It also implies that SSMS ‘gets for free’ some underlying changes in .NET. Many of these new, ‘for free’ behaviors in SqlClient were aimed at ‘cloud applications’ connecting to Azure SQL Database.

One such change SSMS got for free is the connection resiliency logic within the SqlConnection.Open() method. To improve the default experience for clients which connect to Azure SQL Database, the above method will (in the case of initial connection errors / timeouts) now retry 1 time after sleeping for 10 seconds. These numbers are configurable by properties called ConnectRetryCount (default value 1) and ConnectRetryInterval (default value 10 seconds.) The previous versions of the SqlConnection class would not automatically retry in cases of connection failure.

In general, because of these changes, transient errors (across slow networks or when working with Azure SQL Database) are less frequent. However, when you consider that a lot of SSMS users still use it with ‘regular’ SQL Server – either in a VM in the cloud or on-premises, there is a subtle but distinct impact of these changes which may affect administrators of ‘regular’ SQL Server databases.

Impact of these changes

Take for example, a case when the very first user database in your server is inaccessible (for example, when the database is either offline or in a recovering status when it is the non-readable secondary in an Availability Group.) Now, it so happens that when you expand the SSMS Databases section for a server, it enumerates all the databases and tries to read some information from the very first database in the list. When that first database is non-readable, that constitutes a ‘connection error’ for SqlClient. In turn, the connection resiliency logic kicks in and sleeps for 10 seconds and retries the connection. Obviously the second connection will also fail, and then SSMS returns to a ‘responsive state’. Unfortunately, in that 10 seconds the user perceives a hang in the SSMS application.

Reverting to original behavior

There is a simple workaround for this situation. It is to add the following parameter string into the ‘Additional Connection Parameters’ tab within the SSMS connection window. The good news is that you only need to do this once, as the property is saved for future sessions for that SQL Server (until of course it is removed by you later.)

ConnectRetryCount=0

Here’s a screenshot to help as well:

Do note that ConnectRetryCount is a single term (no spaces in between!). Ideally, we would want to expose this through a setting in the Connection Properties screen in SSMS. For now, you can use the above method to revert to original behavior, or if you’d like to see this setting exposed as a GUI option in SSMS, do let us know by leaving your comments below! We are eager to hear from you!

Authored by Arvind Shyamsundar and Shreya Verma

Reviewed by Dimitri Furman, Joe Sack, Sanjay Mishra, Denzil Ribeiro, Mike Weiner, Rajesh Setlem

Graphs are a very common way to represent networks and relationships between objects. Historically, it is not easy to represent such data structures in relational databases like SQL Server and Azure SQL DB. To address this requirement, in November 2016 (through a private preview program for a set of early adopter customers) we introduced extensions to which allow us to natively store and query graphs inside the database on Azure SQL DB.

We recently made these features publicly available as part of the SQL Server 2017 CTP 2.0 release (note that the feature is still in private preview for Azure SQL DB at this time). Please review this related blog post for an overview of the feature. In our blog post we look at a typical use case for graph data in SQL Server and Azure SQL DB.

Scenario

A common scenario we’ve seen with our early adopter customers is their interest to use graph technology to implement ‘recommendation systems’. For this walkthrough, imagine we have to implement a recommendation system for songs. Specifically, let’s imagine a scenario where there’s a user who likes Lady Gaga’s song ‘Just Dance‘. Now, our objective is to implement a recommendation system which will suggest songs which are similar to ‘Just Dance’. So, how do we get started? First, we need data!

What data can we use?

Many approaches to implementing recommendation systems involve using two distinct sets of data: one which contains users, and the other which contains details of the entities that those users are related to.

In retail scenarios, these would be the products purchased by the user.
In our current scenario, these are the songs which those users listened to.

It so happens that there is an amazing source of such data for songs and ‘user tastes’ (which songs did each user listen to) available online. This dataset is called the Million Song Dataset (MSD), and while it has a lot of other information, the specific subset of data that is of immediate interest to us is summarized below:

The list of all the songs is contained in a delimited file available here. There are a million songs in this dataset.
On the MSD website there is a link to another dataset called the ‘User Taste Profile’ data which contains (anonymized) user listening profiles and that is available here. There are a million unique users, and a total of 48 million ‘relationships’ (each corresponding to a row in this file) in this dataset.

What algorithm?

Now that we know what data is available to us, let’s think about the algorithm to be used. A standard approach called collaborative filtering can be used in conjunction with our graph data. Presented below is a simplified graphical representation of the algorithm that we will use.

As you can see from the animation, the algorithm is quite simple:

First, we identify the user and ‘current’ song to start with (red line)
Next, we identify the other users who have also listened to this song (green line)
Then we find the other songs which those other users have also listened to (blue, dotted line)
Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.

Implementation

To begin, we recommend that you quickly review this feature overview video as well as the official documentation links for more details on the new functionality:

Once you have the background, it’s easy to understand how to represent the scenario as ‘graph tables’ in SQL Server 2017. We will create two ‘node’ tables – one for the users and one for the songs. We will then ‘connect’ these two node tables with an ‘edge’ table. Here’s a quick visual summary of what we will be doing:

Importing the data

Now, let’s get to the nuts and bolts! The first step is to declare tables into which we will insert the source data. These ‘staging’ tables are ‘regular’ tables and have no ‘graph’ attributes. Here are the scripts for this:

The next step is to use the OPENROWSET BULK functionality to rapidly ingest the text files into their staging tables in SQL Server. Here are the scripts for this:

Constructing the graph

Once we have the raw data in staging tables, we can then ‘convert’ them into their Graph equivalents. Here are the table definitions; note the usage of AS NODE and AS EDGE to define the tables involved in the graph:

To actually ‘convert’ the data, we use INSERT…SELECT statements as shown below:

Querying the graph

Now that we have all the data in the ‘graph’ form, we can proceed to use the new MATCH function to express our query over the set of nodes and edges. The query below finds songs that are similar to Lady Gaga’s song called ‘Just Dance’!

Optimizing performance

The above query performs relatively quickly (in around 3 seconds on a laptop with an i7 processor). Consider that this query has to deal with a million users, a million songs and 48 million relationships between those entities. Most of the cost is taken by the time to scan through the tables, one row at a time and then match them using hash joins, as you can visualize by looking at the execution plan:

While 3 seconds is not bad, can we make this even faster? The good news is that in SQL Server 2017 CTP 2.0, graph tables support clustered columnstore indexes. While the compression offered is definitely going to help reduce I/O, the bigger benefit is that queries on these tables leverage the ‘batch-mode’ execution which allows much faster execution of queries. This is really useful for us given that the above query is doing large aggregations (GROUP BY). Let’s proceed to create these clustered columnstore indexes:

Once we create these indexes, the performance actually improves substantially and reduces the query execution time to half a second, which is 6x faster than before. That’s really impressive considering the sheer amount of data that the query needs to look at to arrive at the result!
Let’s take a minute to look at the new execution plan. Observe the ‘Batch mode’ execution highlighted below:

The other interesting thing to note is the new adaptive join type highlighted above. This is great to see – queries on graph data benefit with these new query processing improvements inside SQL Server 2107!

Let’s summarize the ‘before’ and ‘after’ states:

	Query execution time (seconds)	Logical Reads (for the Likes table)	Space occupied by the Likes table on disk
Heap tables	3.6	588388	3.4GB
Clustered columstore	0.6	174852	1.7GB

In summary, having graph data inside SQL Server allows database administrators and developers to leverage the familiar, mature and robust query processing capabilities within SQL Server. This is crucial to reducing the learning curve and likely complexity associated with using other technologies to store and query graph data.

Visualizing graphs

While we can use external applications and tools like PowerBI to visualize graphs, the ‘icing on the cake’ is the fact that we can use R Services in SQL Server to visualize graph data. With an open-source R package called ‘igraph’ we can visualize graphs relatively easily and render them to standard image formats like PNG. Here is a code snippet showing you how that can be done:

Here’s a section of the visualization (refer the comments in the above script to understand what the visualization represents) generated. While it is quite basic, but as you can see it is functionally very useful:

Conclusion

The support for graph data in SQL Server 2017 is an exciting new development and opens up doors to a new category of workloads which can leverage this functionality. It is one more step in bringing algorithms and intelligence closer to where the data resides.

Recommendation systems (such as the simple example presented here), fraud detection systems, content and asset management and many other scenarios can also benefit from the integration that graph data in SQL Server 2017 offers. The support for graph data in the database will be also be publicly available for Azure SQL DB in due course of time.

The complete code for this walkthrough is available here. Please use the Comments section below to ask questions and provide your feedback. We are eager to hear from you!

Citations

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.

The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available at: http://labrosa.ee.columbia.edu/millionsong/tasteprofile

Authors: John Hoang, Joe Sack and Martin Lee

Abstract

This article provides an overview of the Microsoft Azure SQL Data Warehouse architecture. This platform-as-a service (PaaS) offering provides independent compute and storage scaling on demand. This document provides data loading guidelines for SQL Data Warehouse. Several common loading options are described, such as SSIS, BCP, Azure Data Factory (ADF), and SQLBulkCopy, but the main focus is the PolyBase technology, the preferred and fastest loading method for ingesting data into SQL Data Warehouse. See also What is Azure SQL Data Warehouse?

Introduction

Whether you are building a data mart or a data warehouse, the three fundamentals you must implement are an extraction process, a transformation process, and a loading process—also known as extract, transform, and load (ETL). When working with smaller workloads, the general rule from the perspective of performance and scalability is to perform transformations before loading the data. In the era of big data, however, as data sizes and volumes continue to increase, processes may encounter bottlenecks from difficult-to-scale integration and transformation layers.

As workloads grow, the design paradigm is shifting. Transformations are moving to the compute resource, and workloads are distributed across multiple compute resources. In the distributed world, we call this massively parallel processing (MPP), and the order of these processes differs. You may hear it described as ELT—you extract, load, and then transform as opposed to the traditional ETL order. The reason for this change is today’s highly scalable parallel computing powers, which put multiple compute resources at your disposal such as CPU (cores), RAM, networking, and storage, and you can distribute a workload across them.

With SQL Data Warehouse, you can scale out your compute resources as you need them on demand to maximize power and performance of your heavier workload processes.

However, we still need to load the data before we can transform. In this article, we’ll explore several loading techniques that help you reach maximum data-loading throughput and identify the scenarios that best suit each of these techniques.

Architecture

SQL Data Warehouse uses the same logical component architecture for the MPP system as the Microsoft Analytics Platform System (APS). APS is the on-premises MPP appliance previously known as the Parallel Data Warehouse (PDW).

As you can see in the diagram below, SQL Data Warehouse has two types of components, a Control node and a Compute node:

Figure 1. Control node and Compute nodes in the SQL Data Warehouse logical architecture

The Control node is the brain and orchestrator of the MPP engine. We connect to this area when using SQL Data Warehouse to manage and query data. When you send a SQL query to SQL Data Warehouse, the Control node processes that query and converts the code to what we call a DSQL plan, or Distributed SQL plan, based on the cost-based optimization engine. After the DSQL plan has been generated, for each subsequent step, the Control node sends the command to run in each of the compute resources.

The Compute nodes are the worker nodes. They run the commands given to them from the Control node. Compute usage is measured using SQL Data Warehouse Units (DWUs). A DWU, similar to the Azure SQL Database DTU, represents the power of the database engine as a blended measure of CPU, memory, and read and write rates. The smallest compute resource (DWU 100) consists of the Control node and one Compute node. As you scale out your compute resources (by adding DWUs), you increase the number of Compute nodes.

Within the Control node and in each of the Compute resources, the Data Movement Service (DMS) component handles the movement of data between nodes—whether between the Compute nodes themselves or from Compute nodes to the Control node.

DMS also includes the PolyBase technology. An HDFS bridge is implemented within the DMS to communicate with the HDFS file system. PolyBase for SQL Data Warehouse currently supports Microsoft Azure Storage Blob and Microsoft Azure Data Lake Store.

Network and data locality

The first considerations for loading data are source-data locality and network bandwidth, utilization, and predictability of the path to the SQL Data Warehouse destination. Depending on where the data originates, network bandwidth will play a major part in your loading performance. For source data residing on your premises, network throughput performance and predictability can be enhanced with a service such as Azure Express Route. Otherwise, you must consider the current average bandwidth, utilization, predictability, and maximum capabilities of your current public Internet-facing, source-to-destination route.

Note Express Route routes your data through a dedicated connection to Azure without passing through the public Internet. ExpressRoute connections offer more reliability, faster speeds, lower latencies, and higher security than typical Internet connections. For more information, see Express Route.

Using PolyBase for SQL Data Warehouse loads

SQL Data Warehouse supports many loading methods, including SSIS, BCP, the SQLBulkCopy API, and Azure Data Factory (ADF). These methods all share a common pattern for data ingestion. By comparison, the PolyBase technology uses a different approach that provides better performance.

PolyBase is by far the fastest and most scalable SQL Data Warehouse loading method to date, so we recommend it as your default loading mechanism. PolyBase is a scalable, query processing framework compatible with Transact-SQL that can be used to combine and bridge data across relational database management systems, Azure Blob Storage, Azure Data Lake Store and Hadoop database platform ecosystems (APS only).

Note As a general rule, we recommend making PolyBase your first choice for loading data into SQL Data Warehouse unless you can’t accommodate PolyBase-supported file formats. Currently PolyBase can load data from UTF-8 and UTF-16 encoded delimited text files as well as the popular Hadoop file formats RC File, ORC, and Parquet. PolyBase can load data from gzip, zlib and Snappy compressed files. PolyBase currently does not support extended ASCII, fixed-file format, and compression formats such as WinZip, JSON, and XML.

As the following architecture diagrams show, each HDFS bridge of the DMS service from every Compute node can connect to an external resource such as Azure Blob Storage, and then bidirectionally transfer data between SQL Data Warehouse and the external resource.

Note As of this writing, SQL Data Warehouse supports Azure Blob Storage and Azure Data Lake Store as the external data sources.

Figure 2. Data transfers between SQL Data Warehouse and an external resource

PolyBase data loading is not limited by the Control node, and so as you scale out your DWU, your data transfer throughput also increases. By mapping the external files as external tables in SQL Data Warehouse, the data files can be accessed using standard Transact-SQL commands—that is, the external tables can be referenced as standard tables in your Transact-SQL queries.

Copying data into storage

The general load process begins with migrating your data into Azure Blob Storage. Depending on your network’s capabilities, reliability, and utilization, you can use AZCOPY to upload your source data files to Azure Storage Blobs with an upload rate from 80 MB/second to 120 MB/second.

Then, in SQL Data Warehouse, you configure your credentials that will be used to access Azure Blob Storage:

CREATE DATABASE SCOPED CREDENTIAL myid_credential WITH IDENTITY = ‘myid’, Secret=’mysecretkey’;

Next you define the external Azure Blob Storage data source with the previously created credential:

CREATE EXTERNAL DATA SOURCE data_1tb WITH (TYPE = HADOOP, LOCATION = ‘wasbs://data_1tb@myid.blob.core.windows.net’, CREDENTIAL= myid_credential);

And for the source data, define the file format and external table definition:

CREATE EXTERNAL FILE FORMAT pipedelimited

WITH (FORMAT_TYPE = DELIMITEDTEXT,

FORMAT_OPTIONS(

FIELD_TERMINATOR = ‘|’,

STRING_DELIMITER = ”,

DATE_FORMAT = ”,

USE_TYPE_DEFAULT = False)

);

CREATE EXTERNAL TABLE orders_ext (

o_orderkey bigint NULL,

o_custkey bigint NULL,

o_orderstatus char(1),

o_totalprice decimal(15, 2) NULL,

o_orderdate date NULL,

o_orderpriority char(15),

o_clerk char(15),

o_shippriority int NULL,

o_comment varchar(79)

)

WITH (LOCATION=’/orders’,

DATA_SOURCE = data_1tb,

FILE_FORMAT = pipedelimited,

REJECT_TYPE = VALUE,

REJECT_VALUE = 0

);

For more information about PolyBase, see SQL Data Warehouse documentation.

Using CTAS to load initial data

Then you can use a CTAS (CREATE TABLE AS SELECT) operation within SQL Data Warehouse to load the data from Azure Blob Storage to SQL Data Warehouse:

CREATE TABLE orders_load

WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH(o_orderkey),

PARTITION (o_orderdate RANGE RIGHT FOR VALUES (‘1992-01-01′,’1993-01-01′,’1994-01-01′,’1995-01-01’)))

as select * from orders_ext;

CTAS creates a new table. We recommend using CTAS for the initial data load. This is an all-or-nothing operation with minimal logging.

Using INSERT INTO to load incremental data

For an incremental load, use INSERT INTO operation. This is a full logging operation but has minimal effect on the load performance. However, roll-back operation on a large transaction can be expensive. Consider breaking your transaction into smaller batches.

INSERT INTO TABLE orders_load

select * from orders_current_ext;

Note The source is using different external table, orders_current_ext. This is the external table defining the path for the incremental data on ASB.

Data Reader, Writers consideration

SQL Data Warehouse adjusts the number of external move readers and writers as you scale. As illustrated in Table 1 below, each DWU has a specific number of readers. As you scale out, each node gets additional number of readers and writers. The number of readers is an important factor in determining your load performance.

Table 1. Number of readers and writers per DWU 100

	DWU
	100	200	300	400	500	600	1000	1200	1500	2000	3000	6000
Readers	8	16	24	32	40	48	80	96	120	160	240	480
Writers	60	60	60	60	60	60	60	60	120	120	240	480

Best practices and considerations when using PolyBase

Here are a few more things to consider when using PolyBase for SQL Data Warehouse loads:

A single PolyBase load operation provides best performance.
The load performance scales as you increase DWUs.
PolyBase automatically parallelizes the data load process, so you don’t need to explicitly break the input data into multiple sources and issue concurrent loads, unlike some traditional loading practices.
Multiple readers will not work against compressed text files (e.g. gzip). Only a single reader is used per compressed file since uncompressing the file in the buffer is single threaded. Alternatively, generate multiple compressed files. The number of files should be greater than or equal to the total number of readers.
Multiple readers will work against compressed columnar/block format files (e.g. ORC, RC) since individual blocks are compressed independently.

Known issues when working with different file formats

In addition to the UTF-8/UTF-16 encoding considerations, other known file format issues can arise when using PolyBase.

Mixed intra-file date formats

In a CREATE EXTERNAL FILE FORMAT command, the DATE_FORMAT argument specifies a single format to use for all date and time data in a delimited text file. If the DATE_FORMAT argument isn’t designated, the following default formats are used:

DateTime: ‘yyyy-MM-dd HH:mm:ss’

SmallDateTime: ‘yyyy-MM-dd HH:mm’
Date: ‘yyyy-MM-dd’
DateTime2: ‘yyyy-MM-dd HH:mm:ss’
DateTimeOffset: ‘yyyy-MM-dd HH:mm:ss’
Time: ‘HH:mm:ss’

For source formats that don’t reflect the defaults, you must explicitly specify a custom date format. However, if multiple non-default formats are used within one file, there is currently no method for specifying multiple custom date formats within the PolyBase command.

Fixed-length file format not supported

Fixed-length character file formats—for example, where each column has a fixed width of 10 characters—are not supported today.

If you encounter the restrictions from using PolyBase, considers changing the data extract process to address those limitations. This could be formatting the dates to PolyBase supported format, transforming JSON files to text files, etc. If the option is not possible, then your option is to use any one of the methods in the next section.

Using Control-node and single-client gated load methods

In the Architecture section we mentioned that all incoming connections go through the Control node. Although you can increase and decrease the number of compute resources, there is only a single Control node. And as mentioned earlier, one reason why PolyBase provides a superior load rate is that PolyBase data transfer is not limited by the Control node. But if using PolyBase is not currently an option, the following technologies and methods can be used for loading into SQL Data Warehouse:

BCP
Bulk Insert
SSIS
SQLBulkCopy
Azure Data Factory (ADF)

Note By default, ADF uses the same engine as SQLBulkCopy. However, there is an option to use PolyBase so you can leverage the performance improvement. See Copy activity and performance tuning guide for performance reference and detailed information.

For these load methods, the bottleneck is on the client machine and the single Control node. Each load uses a single core on the client machine and only accesses the single Control node. Therefore, the load does not scale if you increase DWUs for an SQL Data Warehouse instance.

Note You can, however, increase load throughput if you add parallel loads into either the same table or different tables.

When connecting via a Control-node load method such as SSIS, the single point of entry constrains the maximum throughput you can achieve with a single connection.

Figure 3. Using SSIS, a Control-node load method, for SQL Data Warehouse loading

To further maximize throughput, you can run multiple loads in parallel as the following diagram shows:

Figure 4. Using SSIS (parallel loading) for SQL Data Warehouse loading

Using multiple client concurrent executions should improve your load throughput – to a point. The number of parallel loads no longer improves your throughput when the maximum capacity of the Control node is reached.

Best practices and considerations for single-client gated load methods

Consider the following when using SSIS, BCP, or other Control-node and client-gated loading methods:

Include retry logic—very important for slower methods such as BCP, SSIS, and SQLBulkCopy.
For SSIS, consider increasing the client/connection timeout from the default 30 seconds to 300 seconds. For more information about moving data to Azure, see SSIS for Azure and Hybrid Data Movement.
Don’t specify the batch size with Control-node gated methods. The goal is to load all or nothing so that the retry logic will restart the load. If you designate a batch size and the load encounters failure (for example, network or database not available), you may need to add more logic to restart from the last successful commit.

Comparing load method performance characteristics

The following table details the results of four separate Azure SQL Data Warehouse load tests using PolyBase, BCP, SQLBulkCopy/ADF, and SSIS:

Table 2. SQL Data Warehouse performance testing results

	PolyBase	BCP	SQLBulkCopy/ADF	SSIS
Load Rate	FASTEST=================>>>>>>>>>>>>>>>>>>SLOWEST
Rate increase as you increase DWU	Yes	No	No	No
Rate increase as you add concurrent load	No	Yes	Yes	Yes

As you can see, the PolyBase method shows a significantly higher throughput rate compared to BCP, SQLBulkCopy, and SSIS Control-node client gated load methods. If PolyBase is not an option, however, BCP provides the next best load rate.

Regarding loads that improved based on concurrent load (the third row in the chart), keep in mind that SQL Data Warehouse supports up to 32 concurrent queries (loads). For more information about concurrency, see Concurrency and workload management in SQL Data Warehouse.

Conclusion

SQL DW provides many options to load data as we discussed in this article. Each method has its own advantages and disadvantages. It’s easy to “lift and shift” your existing SSIS packages, BCP scripts and other Control-node client gated methods to mitigate migration effort. However, if you require higher speeds for data ingestion, consider rewriting your processes to take advantage of PolyBase with its high throughput, highly scalable loading methodology.

Reviewed by: Dimitri Furman, Sanjay Mishra, Mike Weiner, Arvind Shyamsundar, Kun Cheng, Suresh Kandoth, John Hoang

Background

Some of the best practices when bulk inserting into a clustered Columnstore table are:

Specifying a batch size close to 1048576 rows, or at least greater than 102400 rows, so that they land into compressed row groups directly.
Using concurrent bulk loads if you want to reduce the time to load.

For additional details, see the blog post titled Data Loading performance considerations with Clustered Columnstore indexes, specifically the Concurrent Loading section.

Customer Scenario

I was working with a talented set of folks at dv01 on their adoption of SQL Server 2017 CTP on Linux which had a Columnstore implementation. For their scenario, the speed of the load process was of critical importance to them.

Data was being loaded concurrently from 4 jobs, each one loading a separate table.
Each job spawned 15 threads, so in total there were 60 threads concurrently bulk loading data into the database.
Each thread specified the commit batch size to be 1048576.

Observations

When we tested with 1 or 2 jobs, resulting in 15 or 30 concurrent threads loading, performance was great. Using the concurrent approach, we had greatly reduced the load time. However, when we increased the number of jobs to 4 jobs running concurrently, or 60 concurrent threads loading, the overall load time more than doubled.

Digging into the problem

Just like in any performance troubleshooting case, we checked physical resources, but found no bottleneck in CPU, Disk IO, or memory at the server level. CPU on the server was hovering around 30% for the 60 concurrent threads, and that was almost the same as with 30 concurrent threads. Mid-way into job execution, we also checked DMVs such as sys.dm_exec_requests and sys.dm_os_wait_stats, and saw that INSERT BULK statements were executing, but there was no predominant wait. Periodically, there was LATCH contention, which made little sense – given the ~1 million batch sizes, data from each bulk insert session should have landed directly in its own compressed row group.

Then we spot checked the row group physical stats DMVs, and observed that despite the batch size specified, the rows were landing in the delta , and not into the compressed row groups directly, as we expected they would.

Below is an example of what we observed from sys.dm_db_column_store_row_group_physical_stats:

select row_group_id, delta_store_hobt_id,state_desc,total_rows,trim_reason_desc
from sys.dm_db_column_store_row_group_physical_stats
where object_id = object_id('MyTable')

As you may recall from the previously referenced blog, inserting into the delta store, instead of into compressed row groups directly, can significantly impact performance. This also explained the latch contention we saw since we were inserting from many threads into the same btree. At first, we suspected that the code was setting the batch size incorrectly, but then we ran an XEvent session and observed the batch size of 1 million specified as expected, so that wasn’t a factor. I didn’t know of any factors that caused a bulk insert to revert to delta store when it was supposed to go to compressed row groups. Hence, we collected a full set of diagnostics for a run using PSSDIAG, and did some post analysis.

Getting closer…

We found that only at the beginning of the run, there was contention on memory grants (RESOURCE_SEMAPHORE waits), for a short period of time. After that and later into the process, we could see some latch contention on regular data pages, which we didn’t expect as each thread was supposed to insert into its own row group. You would also see this same data by querying sys.dm_exec_requests live, if you caught it within the first minute of execution, as displayed below.

Figure 1: Snapshot of sys.dm_exec_requests

Looking at the memory grant DMV sys.dm_exec_query_memory_grants, we observed that at the beginning of the data load, there was memory grant contention. Also, interestingly, each session had a grant of ~5GB (granted_memory_kb), but was using only ~1GB (used_memory_). When loading data from a file, the optimizer doesn’t have knowledge of number of rows in the file and memory grant is estimated based on the schema of the table, taking into account maximum length of variable length columns defined. In this specific case, this server was commodity hardware with 240 GB of memory. Memory grants of 5 GB per thread across 60 threads exceeded the total memory on the box. If this were a larger machine, this situation would not arise. You can also observe multiple sessions that have requested memory, but memory has not yet been granted (second and third rows in the snapshot in Figure 2). See additional details on memory grants here.

Figure 2: Snapshot of sys.dm_exec_query_memory_grants

Root cause discovered!

We still didn’t know the reason for reverting into delta store, but armed with the knowledge that there was some kind of memory grant contention, we created an extended event session on the query_memory_grant_wait_begin and query_memory_grant_wait_end events, to see if there were some memory grant timeouts that caused this behavior. This XE session did strike gold; we were able to see several memory grants time out after 25 seconds and could correlate these session_ids to the same session_ids that were doing the INSERT BULK commands.

Figure 3: Output of Extended event collection. Duration of wait is the difference between the query_memory_grant_wait_end and query_memory_grant_wait_begin time for that specific session.

Collecting a stack on the query_memory_grant_wait_begin extended event and with some source code analysis, we found out the root cause for this behavior. For every bulk insert we first determine whether it can go into a compressed row group directly based on batch size. If it can, we request a memory grant with a timeout of 25 seconds. If we cannot acquire the memory grant in 25 seconds, that bulk insert reverts to the delta store instead of compressed row group.

Working around the issue

Given our prior dm_exec_query_memory_grants diagnostic data, you could also observe from Figure 2 that we asked for a 5GB grant, but used only 1GB. There was room to reduce the grant size, to avoid memory grant contention, and still maintain performance. Therefore, we created and used a resource governor workload group that reduced the grant percent parameter to allow greater concurrency during data load. We then tied this workload group via a classifier function for just the login that the data load jobs were executed under. We first lowered the grant percentage to 10% from the default %, but even at that level, we couldn’t sustain 60 sessions concurrently bulk loading due to RESOURCE_SEMAPHORE waits, as each memory grant requested was still 5 GB. We iterated on the grant percentage a couple times, lowering it until we landed at 2% for this specific data load. Setting it to 2% means that we are preventing a query from being able to get a memory grant greater than 2% of the target_memory_kb value in the DMV sys.dm_exec_query_resource_semaphores. Binding the specific login that was only used for data load jobs to the workload group prevented this configuration from affecting the rest of the workload. Only load queries ended up in the workload group with the 2% limit on memory grants, while the rest of the workload used the default workload group configuration. At 2%, the memory grant requested for each thread was around 1GB, and allowed the level of concurrency we were looking for.

-- Create a Workload group for Data Loading
CREATE WORKLOAD GROUP DataLoading
WITH (REQUEST_MAX_MEMORY_GRANT_PERCENT = 2)

-- If the Login is DataLoad it will go to workload group DataLoading
DROP FUNCTION IF EXISTS DBO.CLASSIFIER_LOGIN
GO
CREATE FUNCTION CLASSIFIER_LOGIN ()
RETURNS SYSNAME WITH SCHEMABINDING
BEGIN
DECLARE @val varchar(32) = 'default';
IF 'DataLoad' = SUSER_SNAME()
SET @val = 'DataLoading';
RETURN @val;
END
GO
-- Make function known to the Resource Governor as its classifier
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.CLASSIFIER_LOGIN)
GO

Note: Usually with memory grants, you can often use query level hints MAX_GRANT_PERCENT and MIN_GRANT_PERCENT. In this case given it was an ETL workflow there wasn’t a user defined query to add the hint for example in the case of an SSIS package.

Final Result

Once we did that, our 4 Jobs could execute in parallel (60 threads loading data simultaneously) in the same timeframe that our prior 2 Jobs, reducing total data load time significantly. Running 4 jobs in parallel in almost the same interval of time allowed us to load twice the amount of data, increasing our data load throughput.

Concurrent Load Jobs	Tables Loaded	Threads loading data	RG Configuration	Data Load Elapsed Time (sec)
2	2	30	Default	2040
4	4	60	Default	4160
4	4	60	REQUEST_MAX_MEMORY_GRANT_PERCENT = 2	1040

We could drive CPU to almost 100% now, compared to 30% before the Resource Governor changes.

Figure 4: CPU Utilization Chart

Conclusion

Concurrently loading data into clustered Columnstore indexes requires some considerations including memory grants. Use the techniques outlined in this article to identify if you are running into similar bottlenecks related to memory grants, and if so use the Resource Governor to adjust the granted memory to allow for higher concurrency. We hope you enjoyed reading this as much as we enjoyed bringing it to you! Feedback in the Comments welcome.

Reviewed By: Denzil Ribeiro, Dimitri Furman, Mike Weiner, Rajesh Setlem, Murshed Zaman

Background

SQLCAT often works with early adopter customers, bring them into our lab, and run their workloads. With SQL Server now available on Linux, we needed a way to visualize performance and PerfMon, being a Windows only tool, was no longer an option. After a lot of research on ways to monitor performance in Linux, we didn’t find a de facto standard. However, we did learn that in the open source community there are many ways of accomplishing a goal and that there is no one “right way”, rather choose the way that works best for you.

The following solutions were tested:

Graphing with Grafana and Graphite
Collection with collectd and Telegraf
Storage with Graphite/Whisper and InfluxDB

We landed on a solution which uses InfluxDB, collectd and Grafana. InfluxDB gave us the performance and flexibility we needed, collectd is a light weight tool to collect system performance information, and Grafana is a rich and interactive tool for visualizing the data.
In the sections below, we will provide you with all the steps necessary to setup this same solution in your environment quickly and easily. Details include step-by-step setup and configuration instructions, along with a pointer to the complete GitHub project.

Solution Diagram

Here is the high-level architecture of how this solution works. Collectd continuously runs in a container on your SQL Server on Linux environment and pushes metrics to InfluxDB. The data is then visualized via the Grafana dashboard, which reads data from InfluxDB when Grafana requests it.

Setup

When we found a set of tools that let us easily visualize the performance for troubleshooting purposes , we wanted to provide an easy, repeatable method for deployment using Docker. The directions below will walk you through setting this up using our Docker images. The complete mssql-monitoring GitHub project can be found here. Give it a try, we welcome feedback on your experience.

Prerequisites

Access to docker.io and GitHub for pulling Docker images and accessing the GitHub repository.
1 – 2 Linux machines for running InfluxDB and Grafana, depending on how large your deployment is.

If using 2 machines, 1 machine will be used for hosting the InfluxDB container and the second machine will be used for hosting the Grafana container
If using 1 machine, it will be used for hosting both the InfluxDB and Grafana containers.

InfluxDB opened ports: 25826 (default inbound data to InfluxDB), 8086 (default outbound queries from Grafana)

Grafana opened port: 3000 (default web port for inbound connections)

A SQL Server on Linux machine or VM that you would like to monitor.

Setting up InfluxDB

For sizing InfluxDB, you can refer to the InfluxDB documentation. Also, note that it is recommended to provision SSD volumes for the InfluxDB data and wal directories. In our experience this has not been necessary when monitoring just a few machines.

Install Docker Engine (if not already installed)
- For RHEL:
```
yum install docker -y
```
- For Ubuntu:
```
wget -qO- https://get.docker.com/ | sudo sh
```
Install Git for your distro (if not already installed)
- For RHEL:
```
yum install git -y
```
- For Ubuntu:
```
apt-get install git -y
```
Clone the mssql-monitoring GitHub repository

git clone https://github.com/Microsoft/mssql-monitoring.git

Browse to mssql-monitoring/influxdb

cd mssql-monitoring/influxdb

Edit run.sh and change the variables to match your environment
Execute run.sh. This will pull down the mssql-monitoring-InfluxDB image and create and run the container

Setting up collectd on the Linux SQL Server you want to monitor

Note: These commands have to be run on the SQL Server on Linux VM/box that you want to monitor

Install Docker Engine (if not already installed)
- For RHEL:
```
yum install docker -y
```
- For Ubuntu:
```
wget -qO- https://get.docker.com/ | sudo sh
```
Install Git for your distro (if not already installed)
- For RHEL:
```
yum install git -y
```
- For Ubuntu:
```
apt-get install git -y
```
Clone the mssql-monitoring GitHub repository

git clone https://github.com/Microsoft/mssql-monitoring.git

Browse to mssql-monitoring/collectd

cd mssql-monitoring/collectd

Edit run.sh and change the variables to match your environment
Execute run.sh. This will pull down the mssql-monitoring-collectd image, set it to start on reboot and create and run the container

Setting up Grafana

If you are doing a small scale setup (monitoring a few machines), you should be fine running this on the same host as your InfluxDB container. We use the image created by Grafana Labs with an addition of a run.sh file that you can use to create and run the container.

Install Docker Engine (if not already installed)
- For RHEL:
```
yum install docker -y
```
- For Ubuntu:
```
wget -qO- https://get.docker.com/ | sudo sh
```
Install Git for your distro (if not already installed)
- For RHEL:
```
yum install git -y
```
- For Ubuntu:
```
apt-get install git -y
```
Clone the mssql-monitoring GitHub repository

git clone https://github.com/Microsoft/mssql-monitoring.git

Browse to mssql-monitoring/grafana

cd mssql-monitoring/grafana

Edit run.sh and change the variables to match your environment
Run run.sh. This will pull down the mssql-monitoring-grafana image and create and run the container

Configuring the InfluxDB data source in Grafana

In order for Grafana to pull data from InfluxDB, we will need to setup the data source in Grafana.

Browse to your Grafana instance

http://[GRAFANA_IP_ADDRESS]:3000
Login with default user admin and password admin

Click “Add data source”

Name: influxdb
Type: InfluxDB
Url: http://[INFLUXDB_IP_ADDRESS]:8086
Database: collectd_db

Click “Save & Test”

Importing Grafana dashboards

We have a set of dashboards that we use and have made available to the community. These dashboards are included in the GitHub repository: mssql-monitoring. Just download them and import them in Grafana. Once the dashboards are imported, you will see metrics that collectd, running on your SQL Server, is pushing to InfluxDB.

How the data gets loaded

In this solution, we leverage collectd and several plugins to get data from the system(s) we are monitoring. Specifically, on the SQL Server side, we leverage the collectd DBI plugin with the FreeTDS driver, and execute the following queries every 5 seconds, using sys.dm_os_performance_counters and sys.dm_wait_stats DMVs. You can view the complete collectd.conf file here. These specific counters and waits provided a good starting point for us, but you can experiment and change as you see fit.

sys.dm_os_performance_counters query

For this query, we needed to replace spaces with underscores in counter and instance names to make them friendly for InfluxDB. We also do not need to reference the counter type field (cntr_type) since the logic to do the delta calculation is done in Grafana with the non-negative derivative function. To find out more about counter types and implementation, please see: Querying Performance Counters in SQL Server by Jason Strate and Collecting performance counter values from a SQL Azure database by Dimitri Furman

SELECT Replace(Rtrim(counter_name), ' ', '_')  AS counter_name,
       Replace(Rtrim(instance_name), ' ', '_') AS instance_name,
       cntr_value
FROM   sys.dm_os_performance_counters
WHERE  ( counter_name IN ( 'SQL Compilations/sec',
							'SQL Re-Compilations/sec',
							'User Connections',
							'Batch Requests/sec',
							'Logouts/sec',
							'Logins/sec',
							'Processes blocked',
							'Latch Waits/sec',
							'Full Scans/sec',
							'Index Searches/sec',
							'Page Splits/sec',
							'Page Lookups/sec',
							'Page Reads/sec',
							'Page Writes/sec',
							'Readahead Pages/sec',
							'Lazy Writes/sec',
							'Checkpoint Pages/sec',
							'Database Cache Memory (KB)',
							'Log Pool Memory (KB)',
							'Optimizer Memory (KB)',
							'SQL Cache Memory (KB)',
							'Connection Memory (KB)',
							'Lock Memory (KB)',
							'Memory broker clerk size',
							'Page life expectancy' ) )
OR ( instance_name IN ( '_Total',
						'Column store object pool' )
AND counter_name IN ( 'Transactions/sec',
						'Write Transactions/sec',
						'Log Flushes/sec',
						'Log Flush Wait Time',
						'Lock Timeouts/sec',
						'Number of Deadlocks/sec',
						'Lock Waits/sec',
						'Latch Waits/sec',
						'Memory broker clerk size',
						'Log Bytes Flushed/sec',
						'Bytes Sent to Replica/sec',
						'Log Send Queue',
						'Bytes Sent to Transport/sec',
						'Sends to Replica/sec',
						'Bytes Sent to Transport/sec',
						'Sends to Transport/sec',
						'Bytes Received from Replica/sec',
						'Receives from Replica/sec',
						'Flow Control Time (ms/sec)',
						'Flow Control/sec',
						'Resent Messages/sec',
						'Redone Bytes/sec')
OR ( object_name = 'SQLServer:Database Replica'
AND counter_name IN ( 'Log Bytes Received/sec',
						'Log Apply Pending Queue',
						'Redone Bytes/sec',
						'Recovery Queue',
						'Log Apply Ready Queue')
AND instance_name = '_Total' ) )
OR ( object_name = 'SQLServer:Database Replica'
AND counter_name IN ( 'Transaction Delay' ) )

sys.dm_os_wait_stats query

WITH waitcategorystats ( wait_category,
						wait_type,
						wait_time_ms,
						waiting_tasks_count,
						max_wait_time_ms)
    AS (SELECT CASE
				WHEN wait_type LIKE 'LCK%' THEN 'LOCKS'
                WHEN wait_type LIKE 'PAGEIO%' THEN 'PAGE I/O LATCH'
                WHEN wait_type LIKE 'PAGELATCH%' THEN 'PAGE LATCH (non-I/O)'
                WHEN wait_type LIKE 'LATCH%' THEN 'LATCH (non-buffer)'
                ELSE wait_type
                END AS wait_category,
                wait_type,
                wait_time_ms,
                waiting_tasks_count,
                max_wait_time_ms
	FROM   sys.dm_os_wait_stats
	WHERE  wait_type NOT IN ( 'LAZYWRITER_SLEEP',
			'CLR_AUTO_EVENT',
			'CLR_MANUAL_EVENT',
			'REQUEST_FOR_DEADLOCK_SEARCH',
			'BACKUPTHREAD',
			'CHECKPOINT_QUEUE',
			'EXECSYNC',
			'FFT_RECOVERY',
			'SNI_CRITICAL_SECTION',
			'SOS_PHYS_PAGE_CACHE',
			'CXROWSET_SYNC',
			'DAC_INIT',
			'DIRTY_PAGE_POLL',
			'PWAIT_ALL_COMPONENTS_INITIALIZED',
			'MSQL_XP',
			'WAIT_FOR_RESULTS',
			'DBMIRRORING_CMD',
			'DBMIRROR_DBM_EVENT',
			'DBMIRROR_EVENTS_QUEUE',
			'DBMIRROR_WORKER_QUEUE',
			'XE_TIMER_EVENT',
			'XE_DISPATCHER_WAIT',
			'WAITFOR_TASKSHUTDOWN',
			'WAIT_FOR_RESULTS',
			'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
			'WAITFOR',
			'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
			'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
			'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
			'LOGMGR_QUEUE',
			'FSAGENT' )
	AND wait_type NOT LIKE 'PREEMPTIVE%'
	AND wait_type NOT LIKE 'SQLTRACE%'
	AND wait_type NOT LIKE 'SLEEP%'
	AND wait_type NOT LIKE 'FT_%'
	AND wait_type NOT LIKE 'XE%'
	AND wait_type NOT LIKE 'BROKER%'
	AND wait_type NOT LIKE 'DISPATCHER%'
	AND wait_type NOT LIKE 'PWAIT%'
	AND wait_type NOT LIKE 'SP_SERVER%')
SELECT wait_category,
       Sum(wait_time_ms)        AS wait_time_ms,
       Sum(waiting_tasks_count) AS waiting_tasks_count,
       Max(max_wait_time_ms)    AS max_wait_time_ms
FROM   waitcategorystats
WHERE  wait_time_ms > 100
GROUP  BY wait_category

Dashboard Overview

With the metrics that we collect from the collectd system plugins and the DBI plugin, we are able to chart the following metrics over time and in near real time, with up to 5 second data latency. The following are a snapshot of metrics that we graph in Grafana (Clicking the images will enlarge them).

Core Server Metrics

Core SQL Metrics

Reviewed by: Suresh Kandoth,Rajesh Setlem, Steven Schneider, Mike Weiner, Dimitri Furman

When analyzing SQL Server performance related issues, customers often have their tools of choice, which can be a feature within the product, a third-party performance monitoring tool, or a home-grown tool that assists in monitoring live performance. For live monitoring, in the SQLCAT lab we use a home grown tool described in this blog. However, when our customers have a performance issue, we, just like support engineers and consultants, can’t always have them ship their third-party tools or associated data, and hence need a way to collect performance related data for post mortem analysis.

PSSDIAG is a popular tool used by Microsoft SQL Server support engineers to collect system data and troubleshoot performance issues. This is a well-known tool for SQL Server on Windows, and we needed equivalent functionality on Linux. PSSDIAG data collection for Linux is now available here. It is a set of bash scripts that collect all the necessary data for troubleshooting performance problems, similar to PSSDiag on Windows.

As part of the default data collection, these scripts:

Collect configuration information about the machine.
Collect performance data from the operating system’s perspective using the sysstat package.
Collect DMV output of sys.dm_os_performance_counters and other DMVs required to troubleshoot various performance scenarios.
Optionally you can turn on other collectors such as Extended Events or custom script collectors.
You may be prompted at the start to install dependent packages if you don’t have them installed, which are also listed in the Readme.

Steps to collect data through PSSDiag on Linux

To collect data and analyze performance issue(s), follow the steps below:

1. Create a folder and download/unzip the pssdiag release version.

mkdir /pssdiag
cd /pssdiag
curl -L https://github.com/Microsoft/DiagManager/releases/download/LinuxRel170810/pssdiag.tar | tar x

Note: The pssdiag folder and its parents must have r+x (Read and Execute) permissions for the mssql account if collecting extended events. By default, on RHEL, the /home/user directory does not have those permission. Either grant permissions or create the folder elsewhere. See the Readme for additional details.

2. PSSDIAG has a configuration file which dictates what data is collected, namely pssdiag_collector.conf. If you need to change the defaults on what data is collected, you may have to modify the configuration options in this file. They are documented both in the Readme and the configuration file itself. A snippet of the configuration file is below:

3. To start the data collection process, execute the command below. NOTE: Some of the data collection does require elevated privileges and therefore should be run as SUDO as shown below:

sudo /bin/bash ./start_collector.sh

4. This will create an output folder under the current folder to store all the data collected.

5. After you have reproduced your problem or captured data for the timeframe encompassing the problem, stop the collector. Invoke the script below which stops the collection and zips up all the files

/bin/bash ./stop_collector.sh

You should get the confirmation message such as the one below, pointing to the location of the zipped output file.

***Data collected is in the file output_denzilrredhat_08_07_2017_04_04.tar.bz2 ***

Additional details: https://github.com/Microsoft/DiagManager/blob/master/LinuxPSSDiag/Readme.txt

Configuring a custom XE collection

By default, collection of Extended Events is disabled. You can enable extended event collection by setting the COLLECT_EXTENDED_EVENTS option in the configuration file to YES. By default, that would create an extended event session capturing batch_completed and rpc_completed events only. If you want to change the extended event session configuration, you will have to modify the pssdiag_xevent.sql script, and put the extended event session definition there, leaving the extended event session name (PSSDIAG_Xevent) unchanged.

Configuring a custom script

In order to collect an additional TSQL script, name the script SQL_Custom_Script.sql and in the pssdiag_collector.conf set the option CUSTOM_COLLECTOR=YES.

PSSDiag Data Analysis

This collected data can be analyzed by using a tool called SQLNexus in the same way as with PSSDiag for Windows: https://github.com/Microsoft/SqlNexus/releases. In the near future, we will be adding a tool that converts the OS metrics (mpstat, iostat, pidstat, and network stats) into a perfmon BLG file, to visualize it in PerfMon for Windows folks who use/prefer PerfMon. As an alternative, you can use any of your favorite shell scripts to analyze the collected data.

Known Issue: When analyzing data collected on a Linux machine with SQLNexus, one of the first things you may notice is the CPU in one of the charts obtained from sys.dm_os_ring_buffers shows CPU consistently at 100%. This is because the sys.dm_os_ring_buffers CPU numbers are not yet integrated into SQL Server on Linux. Other than that, all the other DMV data collected can be visualized.

As you use this to collect performance data, let us know how we can improve it.

Denzil Ribeiro & Suresh Kandoth

Reviewed by: John Hoang, Dimitri Furman, Mike Weiner, Sanjay Mishra

We often get questions from customers related to Azure SQL DW being a good fit for moving a particular database to the cloud. Some of the typical questions are:

I have a database on-premises. I would like to move it into the cloud because that seems to be the future direction. Can I use Azure SQL DW for it?
I am running out of storage space on my on-premises system. I would like to move my database to the cloud. Is Azure SQL DW an option?
My database size is bigger than what Azure SQL DB supports today, scaling out seems like a good idea. Can I use Azure SQL DW?

In this blog, we would like to clarify some of the concepts around RDBMS usage related to OLTP and OLAP workload, Symmetric Multiprocessing (SMP) and Massively Parallel Processing (MPP), workload patterns and anti-patterns, focusing on Azure SQL DW.

What a Data Warehouse Is Not

Data Warehouse (or OLAP) workload is very different than online transaction processing (OLTP) workload with very different indexing strategy and data access pattern.

Transaction processing systems, aka operational systems, or OLTP systems, can be categorized by real time transactions that perform atomic writes and reads . Think about an order processing systems where the backend is a database for a web ordering tool. A customer logs in, buys one or many items and checks out – the interaction between the application and the database during a user session is a series of transactions. Another example of transaction processing is the workload generated by the ATM machines of a bank. Airline ticketing system is another example of an OLTP system, or order processing system.

Usually used by hundreds if not thousands of users concurrently, OLTP systems are created for high volume inserts/updates/deletes and point lookup reads. An outage in your ordering system may be the cause of losing money, thus high availability of your application and backend database is a must.

What a Data Warehouse Is

On the other hand, traditionally data warehouse workloads are write once and read many times. The writes usually happen in a batch fashion, instead of many small inserts (like in OLTP systems). And the reads usually produce some sort of aggregated result, instead of producing individual records.

Called by many acronyms data warehouse (DW or DWH), or enterprise data warehouse (EDW), or Online Analytical Processing (OLAP) systems, such systems usually integrate data from many transactional systems for reporting, analysis, and decision support purposes. Data is extracted, transformed and loaded (this is known as an ETL process) from many disparate systems to create a data warehouse.

Data warehouses sometimes may have many stages, before data can be easily analyzed. Three commonly used stages that I have observed at customer site are described below:

Staging database store – Here data is copied from transactional systems, usually for the last day/hour. This is a temporary store for transactional data.
Operation data store (ODS) – transactional data from the staging store is validated and added to historical data unchanged or slightly modified from transactional systems.
Star Schema conformed databases – ODS data is transformed into “facts” and “dimensions”. On an entity-relationship diagram (ERD), a fact table appear in the middle, and the dimension tables are on the periphery, making it look like a star.

Other variations of this pattern of stages also exists, for example, ODS only or star schema only.

Workload patterns on a data warehouse can be characterized as:

Batch loading of data
Transformed into fact and dimension tables
Complex queries involving multiple table joins
Aggregates over a certain dimension key (example: date or customer)

Following are some examples of complex questions a data warehouse could answer:

How many customers ordered a certain item within a month to see if inventory levels are sufficient?
What day of week people withdraw the most amount of money from an ATM?
What was the cost of a promotional pricing (marketing) vs. how many tickets were sold to a certain destination (how much money was made) for the airline?

Azure SQL DW, MPP and SMP

Azure SQL DW is a Massively Parallel Processing (MPP) data warehousing service. It is a service because Microsoft maintains the infrastructure and software patching to make sure it’s always on up to date hardware and software on Azure. The service makes it easy for a customer to start loading their tables on day one and start running queries quickly and allows scaling of compute nodes when needed.

In an MPP database, table data is distributed among many servers (known as compute or slave nodes), and in many MPP systems shared-nothing storage subsystems are attached to those servers. Queries come through a head (or master) node where the location metadata for all the tables/data blocks resides. This head node knows how to deconstruct the query into smaller queries, introduce various data movement operations as needed, and pass smaller queries on to the compute nodes for parallel execution. Data movement is needed to align the data by the join keys from the original query. The topic of data movement in an MPP system is a whole another blog topic by itself, that we will tackle in a different blog. Besides Azure SQL DW, some other examples of a MPP data warehouses are Hadoop (Hive and Spark), Teradata, Amazon RedShift, Vertica, etc.

The opposite of MPP is SMP (Symmetric Multiprocessing) which basically means the traditional one server systems. Until the invention of MPP we had SMP systems. In database world the examples are traditional SQL Server, Oracle, MySQL etc. These SMP databases can also be used for both OLTP and OLAP purposes.

One quick way to remember the difference is that you can scale-up an SMP system by adding processors with more CPU cores or faster CPU cores, add more memory, and use a faster I/O subsystem. For an MPP system you can scale-out by adding more compute nodes (which have their own CPU, memory and I/O subsystems). There are physical limitations to scaling up a server at which point scaling out is more desirable depending on the workload.

Azure SQL DW Anti-Patterns

Before we discuss what workload is good for Azure SQL DW, let’s discuss what Azure SQL DW is not good for. Azure SQL DW is not a good match for the following scenarios:

OLTP workload
High volume of small reads and writes
Multi-Tenancy Database
Frequent changing of schema
Row by row processing
JSON, XML data and Spatial, Struct, Array and Map data types
Power BI direct query requiring dashboard performance
High concurrency of queries (eg. hundreds of thousands of concurrent queries)
Small datasets (less than 250GB)
Disaster recovery with stringent RPO and RTO

We have already covered the first three points at the beginning of our blog. Let’s cover some more not so obvious points listed above.

Multi-Tenant Data Warehouse

Azure SQL DW is not a good fit for solutions that share the same data warehouse for multiple customers. Though Azure SQL DW allows separate schemas, development cost and complexity for a single database with separate schemas for multiple customers is quite high, can be a security nightmare if compromised. It is impossible to restore a single schema if something goes wrong for a particular customer. Also, SQL Server security features such as Always Encrypted, Row Level Security, Column Level Encryption are not present in Azure SQL DW as of this writing.

As PaaS service, Azure SQL DW makes it easy for customers to create and load their data in the data warehouse. Thus, currently it has a simple implementation of workload management that is not customizable. To be successful, a multi-tenant database would need a customizable workload management per tenant workload.

Frequent changing of schema

Azure SQL DW is a strongly typed RDBMS (Relational Database Management System). Like other traditional databases, it is a schema-on-write system where you create the schema first and then write the data. Reading this data back can be done using common SQL language. On the other extreme, schema-on-read allows loading the data first and then shifts the retrieval of data on to the developer who has to write code to retrieve the data.

Frequent changing of schema falls somewhere in between. There is operational complexity related to implementing frequent schema changes in Azure SQLDW and in other traditional RDBMS (SQL Server, Oracle, etc.). If the nature of your data warehouse is such that your schema changes frequently or your upstream systems are not standardized into a certain schema, you will have to conform to a common schema while doing transformation (ETL).

Row by row processing

Azure SQL DW is not good for row by row processing of data. As an example, if your join criteria in your query is such that you need a scalar UDF to match the column equality (or inequality) you will do row by row processing. In this case, you are not getting the benefit out of your MPP system which is tuned to favor set based operations.

JSON, XML data and Spatial, Struct, Array and Map data types

At the time of the writing of this blog, Azure SQL DW doesn’t support JSON or XML data types or functions. You will have to create a tabular structure (with schema) out of your JSON or XML data before you can load it into Azure SQL DW. Azure SQL DW also currently doesn’t support Spatial, Struct, Array and Map data types. If the prevalence of data for your data warehouse is JSON, XML or if you need the complex data types listed above, you are better off choosing some other database products.

High Concurrency and Dashboard queries

Azure SQL DW is not suitable for queries that come from a BI Dashboards (Power BI or Tableau) reports. This is mainly because dashboard queries require very low response time (often milliseconds to 1 or 2 seconds) and there are many queries that get executed to refresh a single BI dashboard.

Currently Azure SQL DW doesn’t have plan caching mechanism in place. Each query, whether it was run previously or not, has to go through plan search space to find an execution plan, before it starts executing. Coupled with that, if data movement is needed to satisfy a query, query time often diminishes the expectation of a dashboard query behavior. Also, not unlike other MPP systems, SQL DW has a limit to how many queries can run at the same time, which can also contribute to slower query return to dashboard reports.

If you need dashboard query response time, the recommendation is to create an SSAS or Azure AS or Tableau cube from the SQL DW data, to satisfy these dashboard query requirements.

Power BI direct query mode

Using Power BI direct query with Azure SQL DW is not recommended. The reasons are almost the same as in the previous section. Power BI direct query mode will perform poorly with large data size of Azure SQL DW.

The recommendation is again the same as the previous section. If the Power BI dashboard is going against summarized tables (small data) that is properly indexed and distributed, you may be able to run with direct query mode against Azure SQL DW. You may want to create these summarized table so that you do not utilize joins when returning data to Power BI dashboard. Always test with the highest volume of data to make sure if it will work with your reporting need.

Small Dataset

Azure SQL DW is an MPP system which does a great job with lots of data and complex queries. You will not get the advantage of an MPP system if the size of your data in your data warehouse is too small (less than 250GB).

For data size of less than 250GB use SQL Server or Azure SQL DB.

Disaster Recovery with stringent RPO or RTO

Azure SQL DW does not have the capabilities to automatically replicate data to another hot/standby Azure SQL DW system for disaster recovery purposes. It however provides an automatic geo-redundant backup of the data for every 24 hours. In case of a data center outage, you will be able to restore this backup to a paired data center. You may incur some data loss if you have updated your SQL DW tables after the last geo back up was created. Also, since this geo-redundant backup is copied to a Standard storage, depending on how much data you have, restore can take a long time. So if your data warehouse has stringent Recovery Point Objective (RPO) and Recovery Time Objective (RTO), Azure SQL DW may not be able to provide it to you.

Azure SQL DW Patterns

SQL DW is an MPP data warehouse service. It was created for data warehouse workload. In Azure SQL DW, complex jobs (queries or load) are broken down into pieces and executed in parallel. Thus, using Azure SQL DW, your large data loads and complex queries will finish faster. To gain performance from your data warehouse on Azure SQL DW, please follow the guidance around table design patterns, data loading patterns and best practices.

Scale out compute

If you have a data warehouse that has reached the limit of your SMP hardware (single server), you may be thinking of moving the warehouse to a more expensive and beefier hardware. Sometimes this hardware can be very costly with a support expiring after a number of years. You also have to plan for future capacity when procuring this hardware. Hardware procurement can take many months.

If that is the case and you want to save yourself time and headache, consider migrating to Azure SQL DW. Azure SQL DW allows independent scaling of compute. Compute nodes can be scaled up or down based on your need. As an example, if you are doing nightly ETL, you may want to consider scaling up your Azure SQL DW to finish ETL faster in a specified window, even if your data volume increases in the future.

If you need more processing power for a couple of days right after month-end, you can scale up your Azure SQL DW higher to finish work faster and then scale it back down for the rest of the month.

Pause compute

Similarly, Azure SQL DW can also be paused when not in use. If this fits your criteria, you will save money by pausing your Azure SQL DW when the users are not using it. To be clear here, the money that you save is for the compute engines which is the biggest cost on your bill, you still pay for the data that is on the storage. If you have Azure SQL DW dev/test systems, they often can be paused/resumed to save cost too.

Storage capacity

If you have reached your on-premises or cloud storage capacity for your data warehouse workload, you may want to consider moving to Azure SQL DW. Azure SQL DW today is a petabyte scale data warehouse and with continuous improvement the size limitation will only go higher. Usually the larger the data size, the better experience you will have with Azure SQL DW when comparing to your SMP system.

Database consolidation

Often, we have seen customers using their Azure SQL DW for consolidating their data warehouses or data marts in one database. This can simplify a DBAs job. It can also allow querying data that crosses the departmental data mart boundaries, if the need arises. Database consolidation can save you money.

For consolidation, you need to use schemas in a single database, rather than multiple databases, as containers for tables because Azure SQL DW doesn’t support cross database queries. Also, database consolidation brings the concern of query concurrency limit, and potential access control issues. Consider these issues and limits before you go too far down the path of consolidating your data warehouses (or marts) into one database.

ETL vs ELT

Azure SQL DW is fast at loading data when PolyBase is used because it loads data in parallel on all the nodes. It is often advised to load the data as is and then do the transform using set based TSQL language inside the data warehouse, where the data processing takes advantage of the MPP nodes. It often leads to performance gain when compared to loading using a SSIS or other similar ETL tools. This is because ETL solutions using SSIS and tools like it are often single threaded when transforming data. In this case, with PolyBase, extract-load-transform (ELT) is preferred over extract-transform-load (ETL).

If the source of the data is Hadoop or Data Lake which has gained popularity in the recent years, it is often the case that Hive or Spark is used as the transformation engine, loading processed data into SQL DW for fast access. The processing of long running transformation (often referred to as ETL) jobs in Hive or Spark is very useful, as the job processors have built-in resiliency and restartability.

Author: John Hoang

Technical Reviewers: Dimitri Furman, Murshed Zaman, Sanjay Mishra

Overview

This article is one of the several new blogs from the AzureCAT team discussing common customer implementations and proven architecture patterns using SQL DW. In this blog, I will discuss the patterns use by Independent Software Vendor (ISV) on SQL DW. Although the focus is on the ISV workloads, the majority of these characteristics should be applicable for common workloads using Azure SQL DW.

Since its inception, SQL DW has been very popular with ISV. In addition to all the cloud managed service advantages such as quick startup, no infrastructure maintenance, deploying in multiple data centers worldwide, automatic data protection, the service allows the ISV to scale compute resources on-demand, storing petabyte of data, and pause and resume compute resources to save cost. As with any technology, to get the best performance and experience from the product, you need to read the user manual first to understand the core principles. You can start with our SQL Data Warehouse Documentation and highly recommended article Best practices for Azure SQL Data Warehouse.

For the article, I will start with the main characteristics of successful common ISV patterns, discussing some common considerations/workarounds and finally, walk you through three common patterns from our production customers.

Common Pattern Characteristics

Please refer to article Azure SQL Data Warehouse Workload Patterns and Anti-Patterns for a great overview of common SQL DW workload patterns. Again, the ISV workloads will have many characteristics that are similar to the common SQL DW workloads described in the article.

Below are common characteristics of successful SaaS implementations on SQL DW.

Allows for the massive scale of processing power, useful for “burst” scenario.

ISV leverages the on-demand scaling feature for loading, transforming data from multiple data sources of their end users and external data services for “spike” workload such as custom processing for broad, large range of data, month-end processing, “Black Friday”, holiday spike.

Automatically scale up to a petabyte of storage.

With the separation of computing and storage architecture, the storage layer is no longer restricted by computing hardware. Storage capacity is automatically scaled transparently so new users/workload can be added without the restriction tied to your SLO.

Can be paused to reduce compute costs when the instance is idle.

When ISV is done with their data loading, transformation processing, the idled instance can be paused to save cost.

Consolidation of multiple databases into a single instance.

By combining multiple data warehouse instances into a single environment, this simplifies management of the environment, provides a centralized data warehouse with a single source of the truth, and capability to query across all the data.

For multi-tenancy application, create a separate datamart for each tenant.

The benefits of single database tenancy include mitigate resource contention, secure data access, individually control performance level of each tenant database, mitigate service disruption due to scaling/pause from shared tenants and easily track user utilization for charge back billing.

The source data can come from multiple sources.

This can be on-premises databases, Hadoop, flat files sent from end users, cloud data sources from other cloud providers, purchased data sets from the market place, etc.

Data loading frequency

The data loading frequency could range from every few minutes, every few hours to daily. Even if data source pipeline is real time, schedule data ingestion into SQL DW no more frequently than at 5-minute intervals. Allowing extra time between each load will help mitigate the SQL DW backup process from causing load timeout. For the backup process, SQL DW takes a snapshot of your database every 4 hours. Before a snapshot is taken, SQL DW pauses your active DML operations until the snapshot is completed. A typical snapshot could take anywhere from 30 seconds to a couple of minutes.

Query patterns

The queries on SQL DW follow typical data warehouse patterns: star join with aggregation, sequential scan, complex fact to fact join. Common usage is batch reporting, ad-hoc and data mining query patterns. For singleton lookup queries, it is recommended to add a b-tree index for optimal performance.

Dashboard queries

Power BI users with dashboard query performance requirements should use SSAS tabular model that sources data from SQL DW.

Concurrency limits

Understand the concurrency limits, and offload large numbers of users to data marts such as SSAS, SQLDB, SQL Server on VM. The data mart choices will depend on your performance, capacity, and architecture requirements.

Common considerations and workarounds

While it is important to understand the “common patterns” for SQL DW, it is just as important, if not, more important to understand some limitations of the service at the current state. In this section, I will discuss several important considerations and workarounds.

Multi-tenant database

With the capability to scale compute resources, the capacity to store up to a petabyte and leveraging the MPP architecture, it is very tempting for ISVs to use SQL DW as a multi-tenancy database. Please note that I am referring to multi-tenancy database, and not multi-tenancy application. SQL DW implements a simplified workload management model. It has four pre-defined Dynamic Resource Classes and recently added eight Static Resource Classes. Each resource class allocates the number of concurrency slots, the size of memory grants, and query priority. The values for these parameters cannot be changed by users, and depend on the DWU used. All the resources are shared within the database, and you do not have any option for granular control to allocate or limit resource use for a given tenant. To illustrate, if a user from company A is running a “monster” query consuming all the CPU, then the second user from company B could notice that their typically ten-second query is still running after minutes due to resource contention. Resources such as CPU, memory, IO, transaction logs and TEMPDB are shared among all users in the same database. For optimal performance, we strongly recommend ISVs to create one database per tenant. This will not only mitigate resource contention, but allow the tenants the flexibility to choose their own performance level via DWU scaling, PAUSE/RESUME on their own schedule, and easily identify per user utilization for charge back.

Hundreds of thousands of concurrent queries

Azure SQL DW allows up to 1,024 concurrent connections. To provide predictable query performance, Azure SQL DW currently supports from 4 to a maximum of 32 concurrent queries, depending on the DWU. Please see Concurrency limits for detailed information. Any queries submitted after reaching the concurrency limit will be queued until a concurrency slot is available. The number of queued operations can be up to 992 (1,024 – 32). Any connections attempted after the 1,024 concurrent connection limit is reached will return with a failure error message. We recommend our ISVs to create separate data marts for each of their customers. The data mart can be in the form of SQL Data Warehouse, SQL Database, SQL Server on VM (IaaS), SSAS cube, exported data stored in Azure Data Lake Store or Azure Blob Storage.

Scaling/Pausing impact

Currently, the scaling operation is an offline operation. Any pending queries and DML operations will be canceled. For any logged operation, a transaction rollback will occur. The scaling operation does not initiate until the transaction rollback is completed. In a multi-tenant design, or in a single tenant design where multiple users have the permission to scale/pause, unexpected scaling operations can cause a huge impact on currently running workloads, causing unexpected downtime. The recommendation is still to provide a separate database for each tenant. Even for single tenant scenarios, design your security to allow only a minimal number of users who can execute scale/pause. In addition, check and drain active transactions, and coordinate and schedule any scale/pause operation before execution.

Transaction limit impact

In order to guard against long rollback, transaction size limit is implemented in SQL DW. Note that SQL DW has a total limit of 160 GB of transaction log per distribution. The transaction size limit applies to a single DML operation size, and the limit varies depending on the DWU use. To illustrate, at DWU1000, the total transaction log size is 9600 GB (160 GB * 60 distributions). However, the transaction size limit is only 7.5 GB per distribution, or 450 GB total transaction size (7.5 GB * 60 distributions). This safety limit was put in place to prevent a long database outage in the case where a user tries to pause or scale during a long running transaction. Please refer to article “Optimizing transactions for SQL Data Warehouse” for further information and best practices for minimizing the risk of hitting the transaction limit and avoiding long rollbacks.

Restricting data access limitation

Azure SQL DW supports permission grants at the schema and object level within the database. If your application requires a more granular level of data access management, SQL DW does not support Row Level Security (RLS), Dynamic Data Masking, Always Encrypted, and Column Level Encryption. As a workaround for the lack of RLS, you can create multiple views, and assign the appropriate users to the appropriate views. Other than that, there is no practical workaround for lack of Data Masking, Always Encrypted and Column Level Encryption. We recommend you to create data marts in SQL Database, SQL Server on VM (IaaS), SSAS to take advantage of the specific security features to meet your data access requirements.

Cross database join limitation

Azure SQL DW currently does not support cross database joins. As a workaround, use schema instead of a database to separate different database objects.

The Three Top Common Workload Pattern

In this section, I will discuss the three top common patterns implemented by our ISV.

The Hybrid Pattern

One of the most popular and common patterns with SQL DW is the “Hybrid” scenario. This is where the data source(s) are the on-prem RDBMS database(s). The data is incrementally and periodically loaded into SQL DW. This can be daily, several times a day, and sometimes on-demand. (ADF) with PolyBase enabled can be used to orchestrate data loading. Data is extracted from the source database(s), copied to Azure Blob Storage, and finally loaded into SQL DW. For detailed information on data loading, please refer to article Azure SQL Data Warehouse loading patterns and strategies. Any transformation is done within SQL DW. This is the ELT approach we recommend to our customers to leverage the MPP power and on-demand scaling capability for the transformation step.

For the end user of the application, each user gets their own data mart. This can either be in the form of another SQL Data Warehouse, Azure SQL Database, Azure Analysis Services or SQL Server on VM (IaaS), with the performance level of their choice. The end user can choose whatever tool they want to consume the data. Some common tools are Power BI against SSAS tabular in-memory mode, Microsoft Excel, Microsoft Access, and Tableau. Security access is managed by Azure Active Directory with the benefits of self-service password reset, multi-factor authentication, and federation to customer current Active Directory. For an ISV, additional revenue can be generated with additional services such as canned reports, customized data generation, and data processing. ISVs can use tags to organize Azure resources when needed to for billing or management.

Figure 1. The Hybrid pattern architecture

The Database Consolidation Pattern

The second most popular and common pattern for ISVs is using Azure SQL DW for database consolidation. In this example, the ISV has various disparate data in SQL Server databases on-prem, in Azure Table Storage, Azure SQLDB, Oracle on-prem, SQL Server in AWS, and Azure Redis Cache. They wanted to consolidate data from multiple systems into one data repository in the cloud, to have a consolidated view of the end to end data lifecycle. This allows them to perform advanced analysis against a single source database with consistent taxonomy, and to generate reports leveraging on-demand scaling feature. Using ADF as the orchestration tool, data is copied onto Azure Blob storage, then loaded into Azure SQL DW. The ISV opted to uses PolyBase to export data to Azure blob storage as another layer of data protection. This feature allows them to export the data at the object level to various file formats that PolyBase currently supports. For batch reporting, SSRS reports are either scheduled or run on-demand against Azure SQL DW. For ad-hoc interactive dashboard query, Power BI is used against an Azure Analysis Service data mart. With this consolidation cloud solution, the ISV not only saved money over the on-prem alternatives, they also save processing time, allowing their customers to spend more time analyzing the data to make better business decisions.

Figure 2. The Database consolidation pattern architecture

The IoT Pattern

We have a handful of customers with IoT workload that has had unpleasant experience using SQL DW. The data ingestion into SQL DW was so slow that taking extended lunch and many coffee breaks were not enough. Remember that for an MPP system, there is an overhead with query parsing, orchestration, communication with other nodes and processing against distributed databases. Therefore, treating an MPP system like an OLTP system will result in sub-optimal performance. The top pattern to avoid is any type of real time ingestion into SQL DW. Techniques such as singleton insert, using Azure Stream Analytics, which in the background, is nothing more than singleton insert, should be avoided. For this workload, the ISV has a SaaS application that generates logs from over 16,000 Azure SQL DB Elastic Pool databases. The log data is flushed into Azure Event Hub. Real time analytic for application query statistics and fatal logs is being done using Azure Stream Analytics. To serve data warehouse query and BI users, data is written to Azure Blob storage and loaded into SQL DW using PolyBase. An important performance consideration with IoT workload is the number of files generated. The ISV originally had 621,000 files with total data size of 80 GB for 1 day worth of data. Although the total data size is very small, due to the overhead of traversing through a large number of files, it was taking an hour just to create the external table. Obviously, the same overhead also affected the data loading performance. Unable to leverage Event Hub Archive due to this overhead, the ISV built a custom application to reduce the number of files down to 8,700 files per day. Data is loaded every 5 minutes to meet the end user consumption SLA. The ISV is also not able to leverage ADF for the data loading orchestration. ADF is designed for Batch processing so the minimum data loading processing frequency is currently 15 minutes. Finally, another important factor to consider is the extra time needed for post load processing within SQL DW for optimal query performance. Take into consideration the time needed to check for row group compression quality. You may need to either perform an INDEX REORG or INDEX REBUILD depending on the status and quality of your row group. Furthermore, you will also need to create/update statistics to provide the necessary histogram and cardinality for the cost-based optimizer to build an efficient DSQL plan. For customers who require detailed level information or batch reporting, they use SSMS with familiar T-SQL to query against SQL DW. For interactive, dashboard query, end users use Power BI and Excel against SSAS tabular model. This will provide a greater user experience for dashboard query performance, greater concurrency capacity and leveraging the dimensional model with drag and drop capability without having to understand complex join relationship.

Figure 3. The IoT pattern architecture

Reviewed by: John Hoang, Denzil Ribeiro, Rajesh Setlem, Mike Weiner

Introduction

Instant File Initialization (IFI) is a well-known feature of SQL Server, providing significant performance improvement for operations that require an increase in the size of data files, such as database creation, restore, and file growth. Without IFI enabled, zeroes have to be written into the newly allocated file space, which is a time-consuming size-of-data operation. With IFI enabled, space is allocated to the file, but zeroes are not written. SQL Server documentation provides details on how this feature works, and what is required to enable it.

The documentation says quite explicitly that “Log files cannot be initialized instantaneously.” Paul Randal explains the reason for this behavior in this blog. To summarize, log files need to be fully initialized, i.e. filled with zeroes (or other byte patterns), to support database crash recovery.

And yet, the title of this blog is not a mistake. There is a case where log can be initialized instantaneously and yet maintain crash recovery semantics. Specifically, this happens when database files are created directly in Azure Blob Storage.

SQL Server database files in Azure Blob Storage

As you may know, starting with SQL Server 2016, database files can be created directly in Azure Blob Storage as page blobs, rather than as files on local or UNC paths. The SQL Server Data Files in Microsoft Azure documentation topic describes this feature in detail.

Here is an example of creating a database with files directly in Azure Blob Storage, once the credential holding the Shared Access Signature for the storage container is created:

CREATE DATABASE GrowthTest
ON PRIMARY
(
NAME = N'GrowthTest',
FILENAME = N'https://example.blob.core.windows.net/mssql01/GrowthTest.mdf',
SIZE = 8192KB,
FILEGROWTH = 65536KB
)
LOG ON
(
NAME = N'GrowthTest_log',
FILENAME = N'https://example.blob.core.windows.net/mssql01/GrowthTest_log.ldf',
SIZE = 8192KB,
FILEGROWTH = 65536KB
);

Instant Initialization of SQL Server Transaction Log in Azure Blob Storage

Recently, we were working on a performance testing exercise using a SQL Server database with files in Azure Blob Storage. After creating the database using the default 8 MB size for data and log file (as in the example above), we wanted to increase the size of all files to be sufficient for the expected workload. IFI was not yet enabled for the SQL Server instance we were working with, and growing the data file from 8 MB to 1 TB took about one minute (using Premium Storage). This was expected, since the data file had to be fully initialized. We expected that the log growth to 1 TB would take about as much time, for the same reason. It was very surprising then that the same operation on the log file completed in less than one second. Here are the commands we used to grow the files:

ALTER DATABASE GrowthTest MODIFY FILE (NAME = N'GrowthTest', SIZE = 1024GB);
ALTER DATABASE GrowthTest MODIFY FILE (NAME = N'GrowthTest_log', SIZE = 1024GB);

Before anyone starts worrying, the SQL Server storage engine is not broken, and database recovery still works just like it always did. To understand what is happening, we should keep in mind that Azure Blob Storage is very different from the traditional storage systems. When database files are created as blobs in Azure Blob Storage, SQL Server can take advantage of some features that aren’t available in traditional storage systems.

The specific Azure Blob Storage feature that SQL Server is using here is the ability to clear a range of bytes in a page blob, provided by the Put Page API. The byte range specified in the Range or x-ms-range header is cleared when the value of the x-ms-page-write header is set to Clear. This operation has two important properties. One is that clearing the range is a metadata operation, so it happens nearly instantaneously. The other property is a guarantee that reading from a cleared range will always return zeroes, thus allowing SQL Server crash recovery to work in the usual way.

This means that if the log file is created directly in Azure Blob Storage (as opposed to on a disk attached to an Azure VM), SQL Server does not have to initialize the log by writing zeroes to the log blob. It can instead make a call to the storage API to clear a byte range within the blob. This call completes very fast, effectively resulting in instant log file initialization.

The log initialization operation, among other operations on database files in Azure Blob Storage, can be monitored using the xio_send_complete extended event. For log initialization, the file_path field will be set to the URL of the log blob, and the request_type field will be set to XIOTypeZeroFile. Here is an example. The first event fires when the blob is resized to 1 TB, and the second event fires when the 8 MB – 4 GB range is zeroed.

Conclusion

There are two outcomes worth highlighting here:

1. Some of the well-established truths about SQL Server may no longer hold as the underlying technology advances. In this case, SQL Server is taking advantage of a special capability in the storage subsystem, making instant log initialization a reality.

2. This shows how SQL Server can take advantage of functionality that is only available in Azure today. Customers using SQL Server in Azure VMs can benefit from instant log initialization if they create databases with files directly in Azure Blob Storage. This removes delays related to long log initialization from the list of operational concerns.