Quantcast
Channel: SQL Server Customer Advisory Team
Viewing all 96 articles
Browse latest View live

Best SQL Server 2005 MDX Tips and Tricks - Part 1

$
0
0

Overview

SQL Server’s 2005 Analysis Services has introduced several changes to the MDX queries syntax that can lead to better performance than the equivalent AS 2000 queries.

 

Here is a list of tips and tricks as well as best practices on how to get better performance from your MDX queries in AS2005.

Details

For filtering a set, use Filter inside Crossjoin vs the other way around

Filter a set and then use it in the Crossjoin.  Filter function materializes the set and iterates through the set to check condition to build new set. 

 

Avoid:

filter(NECJ({set1},{set2}),..)

 

Use:

NECJ(filter({set1},...),{set2})"

 

Use Rank() over Intersect() function to check if a member exists in a set

The disadvantage of using Intersect() to determine if a member exists in a set is because it treats the member as a set and can not use a better plan in the evaluator.

 

Avoid:

iif(intersect({ACTUALS_DAYS_SET},{[TIME DIM].[Time Main].currentmember}).count)>0

 

Use:

iif(rank([TIME DIM].[Time Main].currentmember, {ACTUALS_DAYS_SET})>0

 

Use Curly Braces in crossjoin with single member

When doing Crossjoins always use sets (add curly braces around single member if used in crossjoin).

 

Avoid:

Sum( [FINANCIAL VERSION DIM].[Financial Version].[Financial Version Type].&[WSLT]

 *{[GROUP STATUS DIM].[Group Status].[Group Status Name].[cancel]

 ,[GROUP STATUS DIM].[Group Status].[Group Status Name].[turn down]},measure)

 

 

 

 

 

Use:

Sum(

{[FINANCIAL VERSION DIM].[Financial Version].[Financial Version Type].&[WSLT]}

 *{[GROUP STATUS DIM].[Group Status].[Group Status Name].[cancel]

 ,[GROUP STATUS DIM].[Group Status].[Group Status Name].[turn down]},Measure)

 

Avoid unnecessary .CurrentMember in calculations.

It is not a good practice to use .CurrentMember when it is not required in the calculations. The formula engine can generate a better query plan if MDX does not use “.CurrentMember” to select the current member of a dimension.  CurrentMember is implied and does not need to be explicitly included in the syntax.

 

No Need to use [TIME DIM].[Time Main].[Year].currentmember in the following MDX.

 

WITH

MEMBER  [Measures].[M] as

'([TIME DIM].[Time Main].[Year].currentmember

,[FINANCIAL VERSION DIM].[Financial Version].[Financial Category].&[ACTL])'

select

{[TIME DIM].[Time Main].[Year].&[2005].members}

*{

descendants([GROUP EVENT DIM].[Group Event].[Hotel].&[12]

,[GROUP EVENT DIM].[Group Event].[Group Event])}

 

Use Exists Function

Exists function should be used where ever possible instead of filter on member properties.

Avoid Lookup

Avoid using lookup function. Try to find out if the cube structure can be modified to have the measure available in the same cube.

 

Use Minus over Filter for a single member

When filtering out a single member from the set use minus over filter function

 

Avoid:

filter({set},.Currentmember <> "UNKN")

 

Use :

( {set} minus {&[UNKN] member})

 

 

 

 

 

Do not use calculated members that are constants

Although there may not be any difference for a simple example, when combined with other calculations, it can cause a more complicated execution plan inside the server.  Note that when using parallelperiod function, often the function evaluates to a constant, so if it is known in advance that is preferred because the engine does not check for certain patterns that are known to be constant.  It can be faster for a UI tool to send a first query to resolve parallelperiod (without other calculations), then substitute into the original query, rather than sending one more complicated query.

 

Avoid:

with

member [a].[NiceName] as '[a].[123]'

member [Time].[YearBefore] as 'parallelperiod( [Time].[year], 1, [Time].[2006].[jan] )'

select { [a].[NiceName] } on 0,

{ [Time].[2006].[jan], [Time].[YearBefore] } on 1

from [MyCube]

 

Use:

 select { [a].[123] } on 0,
{ [Time].[2006].[jan], [Time].[2005].[jan] } on 1
from [MyCube]

 

How to check if the cell is empty?

Usually empty cell is checked for avoiding division by zero or for checking if value is missing (NON EMPTY analysis).

 

  1. Check for non empty to avoid division by zero

 

For a/b calculations

 

IIF(b=0,NULL,a/b)

 

Empty cells are treated as zero in arithmetic operations.

 

  1. For checking empty cells (Non Empty Analysis)

 

Filter([dimension].[hierarchy].member.members, isEmpty(dim.member))

 

This invokes MDX function IsEmpty, which, as the name suggests, checks whether cell value (in this case at coordinate b) is empty or not. Note, that if the cell value is empty, it is treated as number zero in arithmetic operations, however, it is possible, that b had value zero, which is not empty! Therefore check for IsEmpty is appropriate when the user wants to differentiate empty or missing values from the existing values (for example in NON EMPTY like analysis), but is not appropriate for checks in division by zero. 

 

Note: Dot not ever use IS operator (ie: IIF(b IS NULL, NULL, a/b)) to check if the cell value is empty. The IS operator checks if the member b exists or not.

 

 

Miscellaneous design tips

  1. Avoid assigning values like 0, Null, “N/A”, “-“ to cells that would remain empty otherwise. Use Format_String instead, for custom UI formatting of values.
  2. Avoid redundant Sum/Aggregate calculations in situations where default/normal cell value aggregation would do.
  3. Try to avoid IIF.  If have to use IIF, see if it’s possible to write it in such a way that one of the branches (then/else) is written “null”.
  4. Prefer using static literal hierarchy and member references (e.g. Measures.Sales instead of Dimensions(0).Sales, Product.[All] instead of Product.Members.Item(0) etc.).
  5. Avoid using LinkMember, StrToSet, StrToMember, StrToValue.
  6. Prefer using the new calculation Non_Empty_Behavior optimization hint instead of writing calculation expressions of the form Aggregate(NonEmptyCrossjoin(Descendants(…, Leaves) …).
  7. If wanting to get the value of the current cell, consider using an explicit measure name instead of Measures.CurrentMember.
  8. When writing calculation expressions like “expr1 * expr2”, make sure the expression sweeping the largest area/volume in the cube space is on the left side. For instance, write “Sales * ExchangeRate” instead of “ExchangeRate * Sales”, and “Sales * 1.15” instead of “1.15 * Sales”.
  9. Consider replacing simple “Measure1 + Measure2” calculations with computed columns (additional measures) in the DSV or in the SQL data source.
  10. Instead of writing expressions like Sum(Customer.City.Members, Customer.Population.MemberValue), consider defining a separate measure group on the City table, with a Sum measure on the Population column.

Author: Nicholas Dritsas


SSAS Partition Slicing

$
0
0

Summary:
SSAS uses partitions to contain cube data.  Small cubes might use only one, but for non trivial cubes, cube designers will create partitions based on ease of managing data and to split groups of data.  This document discusses how the server uses "slices" to examine partitions at query time.  The slices are stored as a range of internal surrogate keys, and can be thought of as a very high level index.

Using more attributes in the partition definition can potentially improve performance by the ability to exclude partitions.  To benefit user queries successfully requires some consideration of the internal way partition slices are used.  Note that even though slices might not exclude partitons, we still expect the map indexes to be efficient at retrieving only neccessary data.

Slice description:
By "slice" we mean one or more members that limit the multidimensional space.  Queries may have a "slice" as a result of including a set of members on an axis, or by placing members or a set in the where clause.  Query slices are always a finite set of members.  The formula engine may decompose MDX queries into many storage engine queries, which have a simple specification of granularity of each attribute, and a set of slices that limit the contents.

Cube partitions each have a slice for each attribute.  This slice is either set by the administrator as part of the partition definition, or for MOLAP partitions is created during creation of the bitmap index for the partition.  There is one slice per attribute in the partition.  (Similarly there is one bitmap index per attribute in the partition.)  Because the server automatically discovers the slice it is sometimes referred to as the "auto slice". 

The partition slice value is stored in a file in the partition directory, with the name "Info.*.xml".  The information stored is the minimum and maxiumum member IDs that are present in the partition, for each attribute.  The member IDs are internal to the olap server, and are basically surrogate keys, created as a contiguous set of 32-bit integers starting with 1.  This can have an impact when considering the min/max range way that partitions are created.  It might be beneficial to change the SQL query for the dimension attributes so that the order in which attributes are returned matches the partitioning slicing scheme.  Then the alignment of the internal surrogate keys helps the server at query time eliminate some partitions from being considered.  See below the section Design Example.

The slice is created during index creation.  Index creation is controlled by the number of rows, specified in the config file as <IndexBuildThreshold>, with a default of 4096.  Partitions with fewer rows will not have indexes built -- and therefore will not have an "auto slice".

Optimal Number and Size of Partitions

There is not necessarily an optimal number or size of partitions, but some general guidelines exist.  In Analysis Services 2005 Performance Guide (see link below), we suggest at most about 2000 partitions, and maximum of about 2 GB or 10-15 million rows in size.  Some sites successfully use much larger partitions.  For number of partitions, too few can miss opportunities to use slicing or can cause larger time windows for processing.  Too many partitions might take more work to manage, cause some slowdowns of operations during processing (improved after SP2 was released), and can slow service startup time due to many more files on disk.  Too small partitions might not offer any benefit.  Too large partitions might increase the time window for processing past what is available.

Query Time Behavior

During query time all partitions are scanned for possible inclusion, by checking the query slices and partition slices for intersection.  This is sequential, but is fast because it involves only integer range comparisons.  Partitions that are included will be queried in parallel, and partitions that are excluded are not touched further.

There are SQLProfiler trace events for each partition that is used.  Which trace event depends on whether the fact table or an aggregation, and each partition could have a separate aggregation definition.

Troubleshooting Information

It is important for very large cubes that a partition is not touched unless the query depends upon it.  The information generally used to troubleshoot is:

  • Trace event Query Subcube Verbose.  Check to see which attributes have slices.
  • Info.*.xml file, located in the partition.  Make sure slices are defined for the attributes used in the query.
  • Trace events, can be used to check which partitions are being used.  See below.

In SSAS2005 SP2 and earlier, the following behaviors may have a negative impact on the number of partitions touched.

  • Aggregations having fewer than <IndexBuildThreshold> rows are not excluded based on slice, even if a slice range exists for the fact table data.  One possible work around is to lower the value of the <IndexBuildThreshold> configuration parameter in the server configuration file.  However, this is not recommended as it may result in a large number of additional indexes, slowing processing performance.
  • Queries using "OR" slices do not use partition slices.  The OR slice can be identified by a "+" in the trace event Query Subcube Verbose.

Both of these issues are expected to be addressed in a future Analysis Services release.

Trace Events

These are the relevant trace events to examine. 

Query Subcube Verbose.  This trace event is emitted after the internal storage engine subcube query has completed.  The duration shows the total time for creating the datacache.  Note that a query will only have one occurring at a time, and many partitions can be read from concurrently.  So the start and end times will not overlap other Query Subcube Verbose events.  Here is an example part of the output. 

D:3 (Time) [1 8 52 * 0 0 0 0 0]  =>   ((All)):[All]  (Season):[Winter 2007]  (Period):[JANUARY_2007]  (Week):*  (Day):0  (Dimdate):0  (Year):0  (Half):0  (Qtr):0


The numbers in square brackets [1 8 52 * 0 ...] have these meanings:

Single number - The internal surrogate key ID for the member participating in the slice.  Note that the first element is often the All level, which by definition only has 1 member, and therefore its ID will always be 1.  The member name is displayed.

'*' - The answer must have granularity for this attribute, and all members are used.  There is no slice on this member.

0 - The answer should not include granularity for this attribute. 

'+' - A slice is defined with more than 1 member.  This is sometimes called an "OR" slice, because the equivalent SQL query would be of the form: where city = 'Seattle' or city = 'Redmond'.  The list of members is not displayed.  For simple MDX queries it is sometimes possible to infer the list.

Query Subcube.  This trace event is not as readable for a human as Query Subcube Verbose.  The format is a bitmap of attribute granularities, with dimensions delimited by a comma (","). 

The same format of information is emitted by the Get Data From Aggregation trace event, and is also inserted into the QueryLog relational database table, if enabled in the configuration file.  You can cut and paste this string directly into the AggManager sample to help create aggregations that match your queries, without or in addition to using the QueryLog relational table.  (We wish to point out the format can be transferred -- not to suggest creating aggregations by just cutting and pasting.  Designing the set of of aggregations is beyond the scope of this document.)

TextData from Query Subcube event : 00000000,000000000000000000000000,01110000

In this example, there are 3 dimensions, and the query only has granularity on attributes in the 3rd dimension.  (Which for this example happened to be the Time dimension.)

Get Data From Aggregation.  This trace event is emitted when reading from a partition, in an aggregation (not from the lowest level fact data).  It is not emitted if the partition can be excluded based on the partition's "auto slice".

Note that the start and end time will typically overlap the start end end time of other events, because storage engine reads partition data in parallel, and the time will be contained within the time range for Query Subcube and Query Subcube Verbose.

The ObjectPath trace column tells what partition is being read.  Note that Query Subcube and Query Subcube Verbose also have ObjectPath, but are missing the partition information since the query represents the union of all its partitions.  The format is as follows.  Note that for named instances the server name might be of the form MachineName\ServerInstanceName.

ServerName . DatabaseID . CubeID . MeasureGroupID . PartitionName

The TextData column contains the name of the aggregation used and the list of attribute granularity for the aggregation used.  It will always "cover" (be larger than) the granularity of the Query Subcube.  Here is an example:

TextData from Query Subcube event : 00000000,000000000000000000000000,01110000

TextData from Get Data From Aggregation event: Aggregation 1  00000000,000000000000000000000000,01110111

Progress Report Begin, Progress Report End  These trace events are emitted when reading from a partition, in the fact data (not from an aggregation).  It is not emitted if the partition can be excluded based on the partition's "auto slice". 

Note that the start and end time will typically overlap the start end end time of other events, because storage engine reads partition data in parallel, and the time will be contained within the time range for Query Subcube and Query Subcube Verbose.

The ObjectName trace column tells what partition is being read.  The TextData column also tells the same information.  Here is an example:

TextData from Progress Report Begin event : Started reading data from the 'MyPartitionName' partition.


Example of Partition Slice Information

The exact MDX query is unimportant, so we will just look at the Query Subcube Verbose and some partition information.

Query Subcube Verbose, from SQLProfiler trace:

Dimension 0 [Location] (0 0 0 0 0 0 0 0)  [Dimlocation]:0  [Chain]:0  [Division]:0  [Region]:0  [Area]:0  [Store]:0  [Location Type]:0  [Grouped Stores Ind]:0
Dimension 1 [Product] (0 2 2 14 137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)  [Dimproduct]:0  [Company]:[C SAMPLE]  [Brand]:[B 1 SAMPLE]  [Group]:[G 105 SAMPLE]  [Dept]:[D 1125 SAMPLE]  [Category]:0  [Range]:0  [Style]:0  [Colour]:0  [Size 1]:0  [Size 2]:0  [Size]:0  [Season Ind]:0  [Replenishment]:0  [Clearance]:0  [RSP]:0  [Primary Supplier]:0  [Supplier Color]:0  [SKU]:0  [Size Indicator1]:0  [VPN]:0  [Sku Description]:0  [Latest Cost]:0  [Original RSP]:0
Dimension 2 [Time] (0 * * * + 0 0 0)  [Dimdate]:0  [Week]:*  [Period]:*  [Season]:*  [Day]:+  [Year]:0  [Qtr]:0  [Half]:0

The "0" means no granularity for that attribute.

The "*" means include granularity for that attribute, with all members.

The "+" means include granularity for that attribute, with a set of members.

A number means include granularity for that attribute, with one specific member.  The number is the internal surrogate key.

This subcube query included a slice on a set of days.  Though the set members are not included in the trace event, for the actual query the set of [Time].[Day] for this example was { 1465, 1466, ... 1492 }.

So we can expect for this query, that partitions slicing by [Product] or [Time] would be effective at reducing query time by using slice information as a high-level index and reducing the work for the query.

SQLProfiler Trace information:

For this example, aggregations were able to answer the subcube query.  So the relevant trace events are Progress Report Begin, and Progress Report End.  Here are the relevant columns for one trace event.  (Note that in SQLProfiler, this is displayed horizontally as one row.)

EventClass: Progress Report Begin

EventSubclass: 14 - Query

TextData: Started reading data from the 'Aggregation 9' aggregation.

DatabaseName: Sample

ObjectName: Aggregation 9

ObjectID: Aggregation 9

ObjectPath: MyMachine.Sample.Sample SMDB.Fact Sales SKU Store Day 0.Sales_W200639.Aggregation 9

Note that the partition name is part of the ObjectPath, a trace column that is disabled by default.

Here is a subset of info.*.xml for a customer partition year 2005 week 36.  We can see that if the query slice contains any of the Product or Time properties, partitions should be excluded.  Note that some ranges are narrow (few members), and some ranges are broad.  For example, the range for [Product].[Colour] is 2...736, which is not surprising since the product sold in this week probably contains every color.  If there was a goal of partitioning also by color or some other product attribute, that would help queries that sliced by that attribute.

For the example query subcube, we expect it to be excluded based on [Time].[Day].  For this example, it actually was excluded due to the set of days not in the inclusive range 1101...1107 for the property Day.

      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>ID Company</PropertyID>
        <m_MinIndex>2</m_MinIndex>
        <m_MaxIndex>2</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>ID Brand</PropertyID>
        <m_MinIndex>2</m_MinIndex>
        <m_MaxIndex>2</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>ID Group</PropertyID>
        <m_MinIndex>2</m_MinIndex>
        <m_MaxIndex>21</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>ID Style</PropertyID>
        <m_MinIndex>22</m_MinIndex>
        <m_MaxIndex>325486</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>ID SKU</PropertyID>
        <m_MinIndex>122</m_MinIndex>
        <m_MaxIndex>1307861</m_MaxIndex>
      </MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>Colour</PropertyID>
        <m_MinIndex>2</m_MinIndex>
        <m_MaxIndex>736</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Product</DimensionID>
        <PropertyID>Size Description1</PropertyID>
        <m_MinIndex>2</m_MinIndex>
        <m_MaxIndex>2205</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>(All)</PropertyID>
        <m_MinIndex>1</m_MinIndex>
        <m_MaxIndex>1</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Season</PropertyID>
        <m_MinIndex>11</m_MinIndex>
        <m_MaxIndex>11</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Period</PropertyID>
        <m_MinIndex>39</m_MinIndex>
        <m_MaxIndex>39</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Week</PropertyID>
        <m_MinIndex>159</m_MinIndex>
        <m_MaxIndex>159</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Day</PropertyID>
        <m_MinIndex>1101</m_MinIndex>
        <m_MaxIndex>1107</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>Dimdate</PropertyID>
        <m_MinIndex>1101</m_MinIndex>
        <m_MaxIndex>1107</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Year</PropertyID>
        <m_MinIndex>6</m_MinIndex>
        <m_MaxIndex>6</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>Id Trdhalf</PropertyID>
        <m_MinIndex>8</m_MinIndex>
        <m_MaxIndex>8</m_MaxIndex>
      </MapDataIndex>
      <MapDataIndex>
        <DimensionID>Time</DimensionID>
        <PropertyID>ID Qtr</PropertyID>
        <m_MinIndex>14</m_MinIndex>
        <m_MaxIndex>14</m_MaxIndex>
      </MapDataIndex>

Design Example

In this example, it is known that for a cube, users often want to query for a particular city and time range.  It is common to partition by time, and to improve query time, we can also partition by geographical region.  For the real customer cube, there is one city that represents more than 50% of the records.  So we want to create a partition that looks graphically something like this:

2001, BigCity
2002, BigCity
2003, BigCity
2004, BigCity
2005, BigCity
2001, OtherCities
2002, OtherCities
2003, OtherCities
2004, OtherCities
2005, OtherCities

Remember that improving query time is by excluding partitions, and that is done by ensuring discontiguous ID ranges.  The IDs are surrogate keys created during dimension processing.  So for this example, it is required that during dimension processing, the query for Country, Province, and City, will result in BigCity being either before or after the other cities (and provinces and countries).  For example, if it is in the middle, we will get these ranges:

BigCity partition: MinIndex = 500, MaxIndex = 500

OtherCites partition: MinIndex = 1, MaxIndex = 1000

Therefore a query with slice on SmallCity with ID = 123 would exclude the BigCity partitions, but a query with slice on BigCity would not be able to exclude OtherCities partitons, and would take longer.

The preferred result is a set of exclusive ranges (without overlap).  Thus a query with slice on BigCity will exclude 50% of the partitions, and a query with slice on SmallCity will exclude 50% of the partitions.  This is optimal.

BigCity partition: MinIndex = 1, MaxIndex = 1

OtherCites partition: MinIndex = 2, MaxIndex = 1000

So we know the goal, and should check the info.*.xml files after processing, to ensure they form exclusive ranges, and check representative MDX queries, and check the Query Subcube Verbose trace events.

Ordering the rows during dimension processing can be done using a named query as the source of the dimension in the DSV. One could define a different named query for each attribute.  There is a chance this could hurt performance of dimension processing if any operation needed a join.  Another approach is to substitute a view for the dimension table, to perform the appropriate sorting.  Note that at present we have not actually done this, but we believe this approach will work.

Further Reading

Analysis Services 2005 Performance Guide

http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc

AggManager, part of the SP2 samples.

http://www.microsoft.com/downloads/details.aspx?FamilyID=e719ecf7-9f46-4312-af89-6ad8702e4e6e&DisplayLang=en

[end]

 

How to warm up the Analysis Services data cache using Create Cache statement?

$
0
0

Goal

This document describes how to build Create Cache commands.  Create Cache for Analysis Services (AS) was introduced in SP2 of SQL Server 2005. It can be used to make one or more queries run faster by populating the OLAP storage engine cache first.

 

Some customers have found certain queries benefit other later queries.  For example, ascmd.exe could be used every hour to execute all queries in a directory keeping the cache ready for subsequent user queries.  The other approach, which has been used, is to create a cache query for each user query.  This is feasible if the MDX query is part of a report, then one simply adds another query that has the side effect of populating the cache, thereby speeding up the next query.

 

The root of the problem is that during a query, the AS Server does only local optimizations.  Calculations, mixed granularities, and other more complex MDX statements can result in a chatty communication between the FE (Formula Engine) and the SE (Storage engine).  In AS2000 and earlier, this was also a network round trip. 

 

By issuing a cache statement, we can populate the cache with one or more subcubes that cover the regions of cube space that the query will actually need.  We often find approximately the same time taken for each subcube query, so the effect can be dramatic overall.

 

With this methodology, the collection of MDX queries will appear as first executing inside the storage engine, and second inside the formula engine.  In addition to reducing overall time, this can make it easier to predict the effect of multi-user load testing, because the first part uses 100% of all CPUs, and the second part uses 100% of one CPU.

 

Summary Steps

It is an iterative process.  I would describe the identification of potential scenarios where create cache would help:

 

  1. Run profiler
  2. Run the query looking for Non-cache SE queries (query subcube filtered by subevent = 2)
  3. Look at the total time of the query vis-à-vis of the sum of the times of the non-cache SE queries.
  4. If:
    1. They are pretty close and
    2. There are many non-cache SE queries within the same range of time (not just 1 or 2 time consuming)

then create cache might help.

 

There could be variations, but here are one set of steps that have been used successfully:

 

1. Extract all MDX queries as separate files.

2. Add a Create Cache statement to correspond for every MDX query file.

3. Run Clear Cache, Create Cache, then the user query.

4. Verify that the Create Cache is effective and improve, if necessary.

5. Work on next query

6. When done with all queries, combine the Create Cache into 1 or more Create Cache queries.

7. Verify the combined Create Cache.

 

 

 

 

 

 

 

 

Detailed Description

Below are details about each of the steps.

 

1. Extract MDX queries as separate files

Placing the queries in separate MDX files, it can be faster to work one by one and verify if each query is handled correctly by Create Cache before moving on.

 

1. Start a trace.

2. Run the report.

3. Stop the trace.

4. Extract the queries.  One way is with SQLProfiler, /File /Export /Extract SQLServer Analysis Serverices Events / Extract All Queries.  This creates a text file, with each query on a separate line.  When queries have multiple lines this might be confusing, but is easy to add an extra line or otherwise edit.  Either copy each query to a separate file, or, for every query that is being worked on, comment out the other queries.

 

2. Add Create Cache for every MDX query

Below is an example of a Create Cache statement. 

 

create cache

for [MyCube]

as (

            { [USA].[Oregon], [USA].[Colorado], [USA].[Florida], [USA].[Washington]  }

            * { [Measures].[mybasemeasure] }

            * { [2006].children, parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ), YTD( parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ) )  }

            * { [Products].[Shoes].children }

)

 

It should be apparent that it is basically a crossjoin of each dimension member that is specified in the query.  Note that set expressions are allowed.

 

2.a. Add Specified Members

First add all members specified in the query.  NOTE: Create Cache covers static analysis rather than dynamic. MDX with dynamic members will not benefit from this approach.

 

 

2.b. Add Calculated Members and Definitions

During execution, the Formula Engine can issue a subcube query which includes calculated members. 

 

For example, if the cube has a calculated member:

            [MyDim].[Calc123] as [MyDim].[A] + [MyDim].[B]

one should include the following members in Create Cache:

             [MyDim].[A], [MyDim].[B] }

 

2.c. Account for Custom Rollups

During execution, there may be custom rollups that affect the calculation.  For example, a custom rollup may involve QTD() or YTD().  This might be hard to detect since the actual calculations are stored in the relational database.  Custom rollups can be viewed from BI Development Studio.  The dimension must be processed (otherwise the calculations still only exist on the relational database).

 

1. Go to Solution Explorer, double-click on the dimension you want to examine.

2. Click on the Browser tab.

3. Find the icon for Member Properties, click on it.  Select "Custom Rollup", "Custom Rollup Properties".

4. It will display the Custom Rollup (formula) and other properties (such as solve order) for each member.

 

For example, if calculations include YTD or PeriodsToDate, the member list should account for that.  So, if the query includes:

 

            [Date].[2006].[June],

            ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )

            YTD( [Date].[2006].[June] )

 

the only calculations resulting in output are [Date].[2005].[June], [Date].[2006].[Jan] ..[Date].[2006].[June]. However, because of indirect relationships, you should also add the member set:

 

            YTD( ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )

 

3. Run Clear Cache, Create Cache, User Query

Now, it is time to run the Clear Cache statement and examine its effect.  Note that the Cube ID can be specified or left blank to clear the entire database cache entries.

 

<!-- Can be used through ADOMD.NET ExecuteNonQuery call, or passed to ADODB as CommandText   or SQL Server Management Studio XMLA query -->

 

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">

  <ClearCache>

    <Object>

      <DatabaseID>FoodCmp</DatabaseID>

      <CubeID>SalesCube</CubeID>

    </Object>

  </ClearCache>

</Batch>

 

3.a. Connection string parameters

We recommend these parameters to be added at the connection string. 

 

Provider=msolap.3;Datasource=MyServer;Initial Catalog=MyDatabase;Timeout=300;Disable Prefetch Facts=true;Cache Ratio=1

 

Provider:           The name of the OLE DB provider, msolap.

Datasource:      The name of the server.  Might be named instance like MyServer\MyInstance.

Initial Catalog:   The name of the database to use.

Timeout:           Optional number of seconds for command timeout.  This can help to avoid very long runs until the queries are optimized to run faster.

Disable Prefetch Facts: Optional new parameter to disable the Formula Engine heuristic that sends queries for possibly more data than it is requested.

Cache Ratio:     Optional parameter to control the conversion from a specified set of members to the subcube query sent to the SE Storage Engine.  Use 1 to send only the specified list.

 

3.b. Ways to Execute Queries

Because the connection string needs to be modified, SQL Server Management Studio cannot be used.  There are several alternatives:

 

1. One could write a small program in C#.

2. ascmd.exe can execute queries and output time.  The ascmd utility is part of the Yukon Samples kit located here:

http://www.microsoft.com/downloads/details.aspx?FamilyID=e719ecf7-9f46-4312-af89-6ad8702e4e6e&DisplayLang=en

Then look in the C:\Program Files\Microsoft SQL Server\90\Samples\Analysis Services\Administrator\ascmd folder.

3. Use the mdx sample app from AS2000.

 

4. Verify Create Cache is Effective

The goal of using Create Cache is to isolate the storage engine and formula engine work and prevent the user query from executing subcube commands.  The SE is multi threaded when the FE is single threaded.  Therefore, one can just use Task Manager to verify if the SE is running. 

 

During a good run, initially all the CPUs are at 100% since the SE is busy with the Create Cache command.  Then, only one CPU reaches 100% because the FE is busy with the formulas and preparing the result set.

 

Task Manager is the general way to verify, but to be sure, create a trace and look for event Query Subcube, subevent 2 Non Cache.  The cache subcube queries are fast and generally not an issue for AS.  (In fact it shows how effective it is to create a cache first.)

 

4.a. Improve, if necessary

If the user query results in Task Manager CPU spikes (caused by subcube queries), one only needs to determine which members from the main query are not present in the Create Cache statement. 

 

It might be helpful to look at trace information for Query Subcube Verbose.  It might be that by comparing the ones for the Create Cache with the ones for the user query, some differences can be seen, and that could help discover which members should be added to the Create Cache query.

 

Another approach is to simplify the query until the problem does not happen.  Try to use binary search to find the critical part of the user query that causes the extra subcubes.

 

4.b. Eliminate All Subcubes

This general technique works well when one can eliminate all subcubes.  However, it might not be possible to know the set of members in advance (static analysis).  If this is not true, for instance for complex calculations, then the Create Cache technique might not be as beneficial.  Adding members that are not used can be cheap in some cases, for example to add one more product in an already large specified set.  Or it might be expensive, for example to include another large set of data that otherwise would avoid entire partitions.  One will need to experiment and measure the result, and because sometimes it will be a win and sometimes a loss, we recommend considering several user parameters.

 

5. Work on Next Query

 

6. After working with all the main queries, combine the Create Cache into 1 or More Create Cache statements

The best number of Create Cache queries depends on their structure (which dimension members involved, which measure group) and member sets. 

 

1. Different measure groups should be separated unless the dimension members are consistent.  Then, it is just a matter of convenience to combine, since under the covers the measure groups are physically separate.

2. If one query is covered or almost covered by another one, they should be combined.

3. If queries are disjoint or mostly disjoint, keep separate. 

 

We have not experimented greatly, but here is one suggestion.  Execute both scenarios, as separate and combined.  Measure execution time and size of resulting DataCache (from perfmon counter).  Use single combined Create Cache if the combination reduces time to 67% of separate commands and consumes space less than 3 times as much as separate.

 

 

 

 

 

7. Verify the combined Create Cache

Execute all queries, in this pattern:

 

1. Clear Cache

2. Create Cache (possibly many queries)

3. User queries

 

Verify the set of queries in the same way as each single query.

 

 

Contributors:

Eric Jacobsen

Cristian Petculescu

Inconsistencies when using Drillthrough MDX with Perspectives

$
0
0

Author: Denny Lee

Contributors: John Desch, Kevin Cox

Reviewers: Robert Zare

 

 

DRILLTHROUGH MDX Overview

DRILLTHROUGH MDX is a very powerful way to drillthrough to your original data source of detail data that make up the aggregations that you are looking at. For more information on enabling drillthrough, please refer to T.K. Anand’s great article on Enabling Drillthrough in Analysis Services 2005 (http://technet.microsoft.com/en-us/library/ms345125(SQL.90).aspx).  Using a real example, if you were to query the Adventure Works OLAP database for the [Internet Sales Amount] measure:

 

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS

FROM [Adventure Works]

 

The result would be that of: $29,358,677.22.  You can also get the same result when you query the [Direct Sales] perspective (change the [Adventure Works] cube with the [Direct Sales] perspective in the above MDX statement).    

 

With DRILLTHROUGH MDX statement, you can get the details that make up the above value of $29,358,677.22.   Specifically, the MDX statement

 

DRILLTHROUGH

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS

FROM [Adventure Works]

 

 

will provide the dimension attributes and measure details that make up this aggregate value.  Due to the large number of columns and rows included in the output, only the dimension members Promotion and Sales Territory Region and the Internet Sales Amount measure will be shown below:

 

[Promotion], [Sales Territory Region], …., [Internet Sales Amount], …

-------------------------------------------------------------------------

No Discount, Australia, …, 3399.99, …

No Discount, Australia, …, 3374.99, …

…, …, …, …, …

 

 

DRILLTHROUGH MDX ON PERSPECTIVES

But when you run the same drillthrough statement against the [Direct Sales] perspective,

 

DRILLTHROUGH

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS

FROM [Direct Sales]

 

you’ll notice that while all of the dimension attributes are included in the output, none of the measures are listed.

 

Yet, this output behavior is not consistent; if you were to run the query below against the [Sales Summary] perspective,

 

DRILLTHROUGH

SELECT { [Measures].[Sales Amount]} ON COLUMNS

FROM [Sales Summary]

 

The output is includes only four of the ten available measures in the perspective.  To prove this latter point, execute the below statement against the [Sales Summary] perspective and you will see the ten measures listed.

 

select { [Measures].members} on columns from [Sales Summary]

 

As you can see from the MDX statements above, there is inconsistent output from the DRILLTHROUGH MDX statements in which some or none of the measures are included in the output.  Yet, this issue does not come up when you query the cube directly.  Does this mean you will have to run all of your drillthrough queries through the cube?

 

 

WORKAROUND TO SOLVE THIS

You certainly can do this and avoid the use of perspectives altogether will drilling down to the details.  But if you want to use perspectives, the workaround to this problem is to list out all of the measures and dimension attributes you want to see within that perspective explicitly.  An example in reference to the [Direct Sales] perspective is:

 

DRILLTHROUGH

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS

FROM [Direct Sales]

RETURN [Internet Sales].[Internet Sales Amount],[Internet Sales].[Internet Order Quantity],[Internet Sales].[Internet Extended Amount],[Internet Sales].[Internet Tax Amount],[Internet Sales].[Internet Freight Cost],[Internet Sales].[Internet Unit Price],[Internet Sales].[Internet Total Product Cost],[Internet Sales].[Internet Standard Product Cost],[$Internet Sales Order Details].[Carrier Tracking Number],[$Internet Sales Order Details].[Customer PO Number],[$Internet Sales Order Details].[Sales Order Number],[$Internet Sales Order Details].[Sales Order Line],[$Promotion].[Promotion],[$Delivery Date].[Date],[$Sales Territory].[Sales Territory Region],[$Product].[Product],[$Ship Date].[Date],[$Source Currency].[Source Currency Code],[$Date].[Date],[$Sales Reason].[Sales Reason],[$Customer].[Customer],[$Destination Currency].[Destination Currency Code]

 

By explicitly stating what is to be returned, the “missing” measures such as [Internet Sales Amount] will now show up with in your drillthrough to the [Direct Sales] perspective.

 

Please note, this is by design and will not change within Analysis Services 2008.

 

 

 

 

Renaming Olap Databases Issues within Analysis Services

$
0
0

If you are view the XMLA script that is generated for an Analysis Services database, you will notice that there are both ID and NAME attributes.  The reason Analysis Services has this differentiation is so one could rename an Olap database (which you could not do officially within Analysis Services 2000). Database name defaults to its ID and the ID cannot be modified. You can, on the other hand, change the database name. This way end-users could have a different name to the Olap database irrelevant of what its ID was.  For example, you could have the ID of the Olap database as "Foodmart" while the name of the database could be "Wade's Groceries".  The file system itself, i.e. in the c:\Program Files\Microsoft SQL Server\MSSQL.2\Data folder has a Master.vmp and a folder name (e.g. Foodmart) that still refers to the ID.  That is, the folder name and any ID references are to Foodmart irrelevant of whether the database name is "Foodmart", "Wade's Groceries", or "Sally's Deli".

Scenario

A scenario discovered by our customers had noticed an inconsistency with the renaming scenario as per below.

1. Sync with the Foodmart database

2. Rename the Foodmart database to Foodmart_1:

You will notice while the database is now Foodmart_1, the folder name for the database is still called Foodmart - just like the ID.

3. Import the definition of the Foodmart_1 into Visual Studio and make the changes you would like to make (e.g. new dimensions, changes in measure definitions, etc.)

4. Attempt to deploy this Foodmart_1 database back to the original server and you get an error that it already exists.  You do not have the option to overwrite it.  The error is in the form of:

Error 1 Errors in the metadata manager. The database with the name of 'foodmart_1' already exists in the 'server_name' server.  0 0 

5. Try to deploy the database back as Foodmart and you get the following warning.

The 'foodmart' database already exists on the 'server_name' server. If you proceed with deployment, the database will be overwritten.

 

Would you like to continue?

Now you can overwrite the database, but you'll have the db named "foodmart" as opposed to what you wanted - foodmart_1.

 

Discussion

Basically, exemplified in this scenario is that there is a small bug within Visual Studio where it does not differentiate between the ID and the NAME.  To work around this problem, you can rename the Foodmart_1 database back to Foodmart before importing it back into Visual Studio.  Once you're done with your modifications, you can then re-deploy it back to your server as Foodmart, and then rename it back to Foodmart_1.

Often people will rename a database because they plan to version the databases.  For example, over a particular period you have three versions of the same database (e.g. new dimensions, new measures, etc.).  In this example, your database is Blah_V2 for a previous version and Blah_V3 for the current version.  But you do not want users to need to switch to a different database name each time a new version is created so you want to call it Blah each and every time.  To do this, you can build and deploy your database with the ID and NAME of blah_v3.  Once you deploy it, you can rename the original "Blah" database back to blah_v2 and rename the current one to "Blah".  Please note, there can be issues with this approach because your front-end applications will need its metadata to be refreshed when switching from the V2 to V3 due to any of the changes made.

-------------------

Author: Denny Lee

Reviewed by: Kevin Cox, Lubor Kollar, Baya Pavliashvili (baya_baya@hotmail.com), and Nicholas Birke (nick@birke.ws).

SQL Server Best Practices Article: Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services

$
0
0

Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services

There are a variety of things that you can do to improve the performance of an individual MDX query. To begin, you must identify the source(s) of the performance bottlenecks in the execution of a poorly performing MDX query. To identify the source(s) of performance bottlenecks, you need to understand how to use the available Microsoft Windows and SQL Server 2005 Analysis Services troubleshooting tools to assist you with identifying bottlenecks and learn how to interpret the information revealed by these tools. This article provides the information about the available tools and demonstrates how to use the most common of these tools to identify and resolve MDX query performance bottlenecks with individual MDX queries.

New Best Practices Article

Using unformatted cell values it may improve query performance

$
0
0

If a report does not use formatted cell values, query time can be reduced.  An example is a Reporting Services report that does its own formatting and therefore would not use the formatted values from SSAS.  By returning only the value of the cell and not its formatted value in MDX, you can achieve better queries performance, sometimes between 5-20%, depending on the size of the cell set.   The default properties include FORMAT_STRING and FORMATTED_VALUE, and can be omitted by specifying a list of properties.  An example is to add "cell properties VALUE" at the end of every query.

 

select [Measures].[Internet Sales Amount]

on columns

from [adventure works]

cell properties value


Process only one partition so you can browse the cube

$
0
0

If you want to browse the cube and validate dimensions, hierarchies and calculations using the data from only one partition, here is what you can do.  Deploy the new cube solution, run a process structure for the cube and then process only one of the partitions.  Process Structure will process all the dimensions plus mark all the measure groups as processed.  That way, you can start browsing your cube with all the dimensions and one small partition processed.

 

Precision Considerations for Analysis Services Users Whitepaper now available

$
0
0

For those of you whom want to know a little more about precision considerations for Analysis Services users, please check out this recently released whitepaper at http://www.microsoft.com/downloads/details.aspx?familyid=bae8beec-9892-4ecd-a9db-292254895f9c&displaylang=en.

This white paper covers accuracy and precision considerations in SQL Server 2005 Analysis Services. For example, it is possible to query Analysis Services with similar queries and obtain two different answers. While this appears to be a bug, it actually is due to the fact that Analysis Services caches query results and the imprecision that is associated with approximate data types. This white paper discusses how these issues manifest themselves, why they occur, and best practices to minimize their effect.

Authors: Denny Lee and Eric Jacobsen

 

SSAS small/big query interaction

$
0
0

Default SSAS behavior can sometimes result in small queries being slowed down by concurrently running big queries.

Recently in working with a customer, we observed the most common queries when run by themselves took about 1 second and some ad-hoc queries against their 110 GB cube took about 3 minutes.  When running both together, small query times slowed down to several minutes.  This slowdown prevented the customer's project from being successfully deployed.

There is a configuration parameter that can alter the scheduling of the tasks to allow separate queries to run more independently.  The SSAS version that includes the configuration parameter is described in KB article 922852.  We worked with the customer to develop a table of configuration parameter values and the resulting query times.  Some configuration values resulted in the small query running concurrently with big queries in about the same time as when run independently.

The scenario we are focused on is a big query that takes substantially more time than a small query, and when run concurrently the small query is observed to substantially slow down.  The root cause is the big query spends time in one or more storage engine requests, reading from many partitions having tens or hundreds of gigabytes.  Many tasks are queued to a thread pool to handle partitions and segments inside the partitions.

This scenario and its solution may apply to other customers.  However the effectiveness of a proposed solution is based on the root cause of the slow down.  if the root cause is different there might not be a benefit from the same solution.

The steps below can be used to determine similarity to the scenario being discussed.  When working with CSS you may be guided to repeat the steps with a different configuration parameter.

Steps to reproduce scenario:
1. Start small query, wait for completion, record time.
2. Start big query, wait for completion, record time.
3. Restart server.
4. Start big query (but do not wait for completion).
5. Wait 5 seconds then start small query.  This wait time might need to be adjusted to get consistent times, for example to 10 or 15 seconds.  The effect depends on the nature of the big query.
6. Wait for queries to complete, record times.

Customers having this circumstance can work with Microsoft Customer Support Services (CSS) and mention CoordinatorQueryBalancingFactor.  Please keep in mind server properties that are not public can only be changed after consulting with Microsoft Support.

Using ByAttribute or ByTable Processing Group Property with Analysis Services 2005

$
0
0

As noted within the Analysis Services 2005 Performance Guide, there are some niche situations where setting the Processing Group property to ByTable provides more optimal processing than the default value of ByAttribute.

In a customer scenario, we had discovered that they had two dimensions (each of which has >25 million members and 8-10 attributes) where the Processing Group property was set to ByTable.  While the ByTable setting could theoretically have Analysis Services process faster (because it takes the entire set of dimension data and places it in memory), it didn’t in this case because it had taken approximately 80% of all available memory (approximately 25.6GB out of 32GB physical memory) to place just one dimension into memory.  Due to the large size of the dimensions, there were also issues in the dimension processing completing in a timely manner.   The setting ByTable is an optimization that bypasses normal checks and assumes that there is enough memory to process all attributes concurrently in memory.  If this is not true, this may result in processing issues and/or errors.   Therefore, it is important for you to monitor memory usage; especially if dimension size grows over time.

Note, we had investigated using the MaxParallel setting to limit concurrency.  Often this is helpful during partition processing, but it was not helpful during dimension processing for our scenario.  We feel it was not useful because the root cause is the large amount of memory consumed when using ByTable, and either it is only one dimension causing the problem, or the memory quota mechanism was sufficient to prevent doing two very large dimensions concurrently.

With the attribute set to ByAttribute, the maximum amount of memory Analysis Services had to take up during processing was 9GB (vs. 25.6GB on the same server).  That is, the default setting in many cases will use less resources and process in a timely manner.   More specifically for this scenario, the reason for the processing issues was because of the large size of the dimensions (# of members, # of attributes, etc.) and when using the ByTable setting, AS had tried to put the entire dimension into memory instead of delegating this process to the relational database (i.e. when using the ByAttribute setting).

Contributors: Denny Lee, Richard Tkachuk, Akshai Mirchandani, Eric Jacobsen

 

New Best Practices Articles Published - Analysis Services Many-to-Many Dimensions: Query Performance Optimization Techniques

$
0
0
Many-to-many dimension relationships in SQL Server 2005 Analysis Services (SSAS) enable you to easily model complex source schemas and provide great analytical capabilities. This capability frequently comes with a substantial cost in query performance due to the runtime join required by Analysis Services to resolve many-to-many queries. This best practices white paper discusses three many-to-many query performance optimization techniques, including how to implement them, and the performance testing results for each technique. It demonstrates that optimizing many-to-many relationships by compressing the common relationships between the many-to-many dimension and the data measure group, and then defining aggregations on both the data measure group and the intermediate measure group yields the best query performance. The results show dramatic improvement in the performance of many-to-many queries as the reduction in size of the intermediate measure group increases. Test results indicate that the greater the amount of compression, the greater the performance benefits—and that these benefits persist as additional fact data is added to the main fact table (and into the data measure group).
 

SQL Server 2005 Best Practices Analyzer (January 2008) -- Now Available!

$
0
0
It's a new year and, with it, we come bearing gifts! We have a pretty significant update to SQL Server 2005 Best Practices Analyzer. It contains many new and updated rules for Analysis Services, a few important rules for the Relational Engine, a couple bug fixes for the UI and command line tools. And all of these rules have rich documentation telling you what need to know. ......(read more)

Using BIDS 2008 to validate Analysis Services 2005 cubes

$
0
0

If you have experimented with cube design in SQL Server 2008, you will probably have run into the new AMO design warnings in Business Intelligence Development Studio 2008 (BIDS). We find that these warnings help customers a lot: they allow them to quickly analyze your cube for classic design mistakes and provide advice on how to correct them. In Analysis Services 2005 you need the best practice analyzer to get the same warnings.

 

Did you know that you can use BIDS 2008 to connect to Analysis Services 2005 cubes and immediately get the new 2008 AMO warnings feedback on your 2005 design? You can even save the cube back to Analysis Services 2005 with BIDS 2008. If you decide to save back the cube to a 2005 server you should first perform a backup of the 2005 cube.

 

Please note that you CANNOT use BIDS 2008 to save Integration Services or Reporting Services 2005 projects. If you open an Integration Services 2005 or Reporting Services 2005 project in BIDS 2008 it will be upgraded to the 2008 format instead.


Gemini - Self Service BI!

$
0
0

Gemini is the code name for the new breakthrough Self-Service Business Intelligence (BI) capabilities being delivered in the SQL Server 2008 R2 release. Gemini enables end users to build BI applications by integrating data from a variety of sources, modeling, refining and analyzing the data, adding business logic, building reports and visualizations and ultimately sharing it with their coworkers in an environment that is managed and secured by IT.

You should bookmark the Gemini Team Blog for the latest information on Gemini.  As well, see SQL Server 2008 R2 | Self-Service Business Intelligence for more information.  Once we start compiling best practices and lessons learned for Gemini, we will also publish more information on sqlcat.com.

 

Optimizing CREATE SET in Cube Calculation Scripts

$
0
0
Author: Thomas Kejser
Reviewers: Peter Adshead, Nicholas Dritsas, Sanjay Nayyar, John Desch, Kevin Cox, Akshai Mirchandani, Anne Zorner

In this blog we will describe an important optimization that you should apply to cubes that are processed often and have CREATE SET commands in the calculation script. We will describe the measurements you can make to determine if this may affect you and also provide solutions that can make your cube be much more responsive to users.

Background

When you process a cube in Analysis Services, some expressions in the calculation scripts for connected users may get invalidated and marked as needing re-execution. This is because sets and expressions in the script my have a dependency on the new data in the cube. Examples of these dependences are: CREATE SET and SCOPE that contains expressions like: FILTER(<set>, <exp>).

The invalidated expressions are re-executed when a user issues a DISCOVER or execute a query. DISCOVER command are for example issued when Excel connects to the newly processed cube. Once the expressions are re-executed, Analysis Services caches the result of the expression and will use this cache to response to all future requests.

As you may be able to deduce – anyone issuing a DISCOVER or executing a query after a process operation will have to wait or all CREATE SET and SCOPES to be executed. If there are complex expression in the calculation script, this can take a long time and will show up as latency in the client interface.

Observing the Behavior

If you have a cube with a CREATE SET or SCOPE in the calculation script you should measure the time to takes for the calculation execute. To do this, set up a Analysis Services profiler trace for these events:

image

Now, process any partition in the cube and start a new connection to the cube with Excel. You will see a series of events:

image

In the above example, notice ConnectionID 25. We first see the Login and Initialize events, follwed by Discover events.. But note that the last DISCOVER event does not finish until ConnectionID 0 is done executing sub cubes. Those ConnectionID = 0 sub cubes are generated by the calculation script. In this case, 426 seconds are spend on retrieving the sub cubes. While this happens, no new DISCOVER and query events are being served from any connection. Note that deploying an updated cube script to a cube will exhibit a pattern similar to a process operation.

Using a trace, you can quantify the total time is takes to fetch the sub cubes required to execute the calculation script.

Optimizing the Calculation Script

If you determine that the overhead to executed the CREATE SET or SCOPE command it too large, you have several options to optimize it:

  1. If possible, use only block style calculations in the CREATE SET and SCOPE
  2. Create aggregates to support the execution of the expression
  3. Pre-warm the cube after processing

Ad 1) Using the MDX tuning guidance in the Analysis Services 2008 performance Guide – rewrite the CREATE SET or SCOPE commands to use block style calculations.

Ad 2) If you see Query Subcube events in the trace, you should consider adding aggregates to support those subcubes. We recently did such an optimization for a customer and brought execute time for calculation script down from 15 minutes to a few seconds.

Ad 3) Assuming you are processing during in off-peak hours or on a server not available for user queries (for example, a scale-out solution), you can run a DISCOVER command after you are done processing. This causes the calculation script to be executed and cached before the first user connects. If you are already using a cache warming script – adding the DISCOVER command to this script is a small change. However, note that the cached subcube may still be evicted from the cache under memory pressure – so it is generally preferable to use aggregations.

Conclusion

In this blog we have described how you can measure the time it takes for a calculation script to be re-executed after a processing operation has invalidated it. If this re-execute time is long – you should consider optimizing the expressions in the calculation script. Sometimes, this optimization is as simple as adding an aggregate that makes the CREATE SET or SCOPE execute faster.

What is the SSAS Maestros?

$
0
0

by Denny Lee (SQL Customer Advisory Team), Daniel Yu (SQL Marketing)

Over the last few weeks, there has been a lot of questions, tweets, and rumors about the SSAS Maestros program – some notable blogs include Kasper de Jonge’s Do you have what it takes to become SSAS Maestro and Vidas Matelis’ SSAS White Paper List

 

Background Information

We had created the SSAS Maestros program as a way to broadcast the lessons learned from some of SQLCAT’s more complex Analysis Services UDM enterprise customers – such as Yahoo!’s 12TB Analysis Services cube that was announced during the PASS Summit 2010 Day One Keynote with Ted Kummert.  The deep technical learnings from names like Akshai Mirchandani, Thomas Kejser, John Sirmon, Denny Lee, and more! have been incorporated together by our good friends from Solid Quality Mentors into a three-day deep dive course in:

  • Redmond (fully booked now, though we’re planning a second US session shortly)
  • London (fully booked now, though we’re looking into a second London session)
  • Hong Kong: still available!

We will have more courses planned but this is the initial batch of courses as we will be working out the kinks and improving the course over time.  Currently, we’ve made this a 11 module course with labs and exams.

 

Benefits

As noted in the above MS events course link as well as Kasper’s blog, the key benefit will be of course the material that you learn.  But some of the other great things include:

  • Be one of the few part of the elite SSAS Maestros (more on this later)
  • The SSAS Maestros list will be published on a Microsoft site for your customers to reference.
  • When SQLCAT needs someone to help with a complex SSAS engagement or needs to refer to a partner – we will refer to a SSAS Maestro.
  • SQLCAT and the Analysis Services team will be creating SSAS Maestros-only webcasts and Q&A sessions - This will be an exclusive forum to discuss the latest best practice and share critical knowledge 

As you can see, there is a lot of great benefits to this great course! 

 

But…what’s the catch?

Due to its complexity and our dive into deep technical learnings, this is not going to be an easy course.  Some examples of this include:

  • Following technical conference conventions of 400 level sessions, we consider this a 500 level course.
  • There is an application to do this course and entrants are accepted based on showcasing their deep technical experience with Analysis Services.
  • During the three-day course, there will be a number of labs which will have very little guidance (on purpose).
  • Upon completion of the course, there will be a take-home exam project that will need to be completed within thirty (30 days).

We have been drawing inspiration from the SQL Server MCM course – which many of you know is extremely tough!

 

Discussion

As this is the first year of the course, we will be very strict and limit the number of applicants into the course.  This allows us to work out the kinks in the course as well as figure out logistics around the webcasts and exams / re-examinations.  But due to its popularity, we are already rapidly working on planning V2 of the course! So stay tuned!!

 

 

Why the obsession with random I/O within the context of SSAS?

$
0
0

by Denny Lee

As many of you know from the various blogs, whitepapers, and conferences from SQLCAT, there is a big obsession or compulsion toward I/O by various members of SQLCAT. If you’re not already familiar with this topic, definitely reference Mike Ruthruff’s whitepaper Predeployment I/O Best Practices.

But as could be seen from the SSAS Maestros session in Redmond this week (for more information, check out What is the SSAS Maestros?), there is even an obsession for IOps even within the context of Analysis Services as noted in the papers below

 

So why is the obsession?

As noted in the SSAS Maestros course, if you think I’m bad, ping my counterpart Thomas Kejser who is even more obsessed and all-knowing about IOps. The reason for this focus is because for many queries within Analysis Services (especially for enterprise scale cubes), there are a lot of threads that hit the storage engine. This means that the threads will be hitting disk – i.e. the disk I/O subsystem. And even if the each individual thread causes only sequential I/O, the cumulative effect of all of those threads is random I/O. And as noted in the below graphic - which is from Michael Anderson’s excellent Scalable Shared Database Part 5 post – both the IOps and MBps is substantially slower when the system is random instead of sequential.

 

So what can we do about it?

As noted by the our posts Analysis Services Distinct Count Optimization Using Solid State Devices as well as the paper: REAL Practices: Performance Scaling Microsoft SQL Server 2008 Analysis Services at Microsoft adCenter, the solution is to utilize NAND devices which have superior random IOps performance.

Note, you can still get great random IOps performance using regular spinning media provided you have enough spindles for your LUNs and you stripe and/or short-stroke the disks.

Or…you can make sure you buddy up with your friendly systems engineer whom can take care of this for you!

Enjoy!

Announcing SSAS Maestros v1.2

$
0
0

We are proud to announce that SQLCAT will continue with the SSAS Maestros course in June and July in Redmond and Madrid.

 

SSAS Maestros 1.2 Courses

Join us for a five-day deep-dive course on Analysis Services 2008 R2 UDM and join the SSAS Maestro Program. Prepared and presented by SQLCAT, top industry experts and the SQL Server Analysis Server team, this intensive 500-level course gives top SSAS professionals the education and hands-on experience needed to deliver highly complex and highly scalable OLAP solutions using Analysis Services 2008 R2. Registration requires screening and approval of qualified attendees. Click on the links below for more details.

SSAS Maestro Program - Redmond, WA- June 13-17

SSAS Maestro Program - Madrid, Spain - July 18-22

 

The v1.2 of the SSAS Maestros course has been updated based on our learnings from the v1.0 course. How well did people like SSAS Maestros v1.0? Linked is Vidas Matelis' [blog | twitter] opinion of the course: SSAS Maestro program – my experience so far.

 

We are also proud to announce the first set of SSAS Maestros Instructors that will be leading the v1.2 effort as well.

In Redmond, we are fortunate to have BI industry experts

And in Madrid, we are fortunate to have the BI gurus:

  • Marco Russo [blog | twitter]
  • Chris Webb [blog]
  • Thomas Kejser [blog] will be the SQLCAT representative

To keep current on SSAS Maestros, don’t forget to follow the hash tag #SSASMaestro

 

SSAS Maestro v1.0 Update

But how about all of the folks who completed Round 1 of SSAS Maestros in Redmond, Hong Kong, and London? Apologies for the delay but it has taken a bit longer than we had originally anticipated to complete the evaluations.

Our partner Solid Quality, CSS, and us are currently going through the evaluation process and we believe the evaluations will be completed by June/July time frame.

 

So stay tuned!! (another great way to stay tuned is to follow @sqlcat and/or search #SSASMaestro)

 

 

 

Viewing all 96 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>