Tableau Performance Tips #2 : Avoid total and sub-totals when using a count distinct metric aggregate


The problem : Computing-Totals is done sequentially

If you use a total (or worse) sub-totals when you have metric(s) that is count distinct, this will lead to a second (or several) sequential pass for each subtotal level to retrieve data. The « total/subtotal » pass is done after fined grain data retrieval.

This is normal, you will say and I agree !
When a count distinct  aggregate is used, it is mandatory to ask the computation for every level of subtotals and for the totals.
Let’s take a exemple : you define the number of doctor as COUNTD(DoctorID). If you want to display is the same table, the number of doctor by cities, counties, and the total number of doctors. We can easily understand the problem if a doctor can work in several cities and even several counties. If sub-totals, is calculated we a simple sum you will have double counted doctor

The problem is that this multi-pass computation is currently not done in parallel but sequentially, one metric after another AND one sub-totals level after another (cities/counties/gran-total). If you have 2 count distincts metrics with 3 sub-totals, and if each computation take 2s, you will have to wait 2*3*2s=12s to finish the computing-totals phase.

Tableau Performance and Sub-Totals

In the cas we have a metric as number of product : COUNTD([product_key]) and we ask tableau du display all totals.
Here we have 3 sub-totals :

  1. The total per super-brand (D, L, and V),
  2. the total per family of product,
  3. The gran-total
Tableau Performance and Sub-Totals

The 3 sub-totals asked for a countd metric generate 3 different queries that run sequentially

How to avoid or reduce this phase :

Firstly, double check if the countd is mandatory : if the countd is on the key of the facts table then a sum(1) or simple count could do the job. This solution is the best if you can use it because it will have effect every dashboard/views where the totals of the metrics was used.

Another solution, available since the 2020 version exist : There is limitation regarding the dimension(s) used on your view, but you can switch the total behavior from default (countd) to sum/max/min. This will remove the second pass query to the datasource using a simple computation from the raw dataset.

This work here because there is no overlaps : one product is only one family and one super-brand so there is no need to recompute the count distinct of product from the source. In this case it is possible to use a simple sum on the different level of the dataset. The sum is done by tableau without re-asking the datasource.

Another solution would be to break the table into 3 different views, one for each level. Thus the totals could be display but the data retrieval will be done in parallel.