Technology•Oct 21, 2014
SSAS: A Simplified Look at Summing and Averaging Distinct Counts with Multi-Selects
In this post, I’m going to explore some difficult calculations that come from using distinct counts in SQL Server Analysis Services (SSAS) cubes.
Summing Distinct Counts
Sometimes users want to see the sum of a distinct count over several periods. I want to take a look at why this kind of calculation doesn’t make sense and why MDX won’t let you do it. When I was first facing this issue, it was difficult to see the problem with such a large set of data, so to illustrate it we are going to be working with a very small sample set.
Let’s pretend that Jack, John, Sarah, and Megan are the only customers at our local grocery store. The table below tracks what each of them bought over the course of five weeks:
Table A below attempts to show the user’s desired result, the distinct count of customers by week. The total makes it look like the grocery store had 14 unique customers over the course of the five weeks, however, we already know Jack, John, Sarah, and Megan are the store’s only customers. We know the total should be four, but it isn’t here.
Does that total of 14 actually mean anything? Maybe. According to Table A the store made 14 sales during the five weeks. But that total isn’t always accurate.
Jack forgets what he needs at the grocery store and has to make a different trip to the store for each item he buys. So during Week 19, while John made one trip to buy milk and butter, Jack made two trips, one to buy milk and another to buy eggs. So Jack actually counts as two sales.
Table B now shows what the user really wants—the total number of sales by week. If the user wants to see how many sales the grocery store made, the correct answer would be 17, not 14.
This is why summing a distinct count doesn’t make sense. The total of 14 in Table A no longer means anything useful for us. The correct total in Table A should be four. The user needs to understand that what they want won’t actually mean much for them. Taking the sum of a distinct count rarely makes sense and should be avoided.
Averaging Distinct Counts
What about averages? Sometimes the user will want to see the average number of unique employees that they served over several periods as the ‘Total’. An average of a distinct count actually does make sense, but how can we do this in MDX if it doesn’t let us sum the counts? After all, you need a sum to get an average. MDX has an AVG function, but it doesn’t work with multi-selects how you would expect it to. The AVG function works great if you want to see the average count over the entire period, such as all weeks in a year, but not so great when you select multiple sub-periods.
Assuming we already have a measure for the distinct count of customers, [Measures].[Distinct Customers], the first step is to create a dynamic set. The set should be all the sub-periods the user has selected. To do this, you can adjust the following MDX to fit your date hierarchy:
CREATE DYNAMIC SET CURRENTCUBE.[Selected Weeks] AS [Date].[Date].[Week].members, DISPLAY_FOLDER = 'Sets';
Next, we can use that set to calculate the average over the periods within it:
CREATE MEMBER CURRENTCUBE.[Measures].[Average Customers] AS AVG(EXISTING [Selected Weeks],[Measures].[Distinct Customers]),
FORMAT_STRING = "Standard",
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Customers' ;
The “EXISTING” keyword above forces the set to be evaluated within the current context. Without it, 2.8 would be displayed for every week in Table A above. Table C shows the correct and expected result.
While these concepts may seem simple, it can often be difficult to see with large sets of data. Credera has a lot of experience leveraging SQL Server tools that will help you solve your toughest challenges. Be sure to follow us on Twitter or LinkedIn for more useful tips. If you have a question or comment related to the blog, you can use the comments section below to join the conversation.