Tableau Performance Tips #8 : Avoid using Tableau Groups

Tableau Custom Groups : a really cool and useful feature !

Tableau allow users to build their own custom groups very easily. This is a convenient way to regroup data on elements that users want to see regrouped.
When you have small data you won’t see performance issue with groups but when you use a database when a big amount of data things can change.

Tableau continuously improve his query engine and performance issues described today will surely be corrected in the future but let’s see why today tableau groups can lead to performance issues :

Tableau Group Building

It is very easy to build a group on tableau : simple select a list of elements of one field (calculated or not) and click on the group icon.

You obtain a new field in the data panel with the group icon on it left ==>

A group in the data pane
Tableau Visual Grouping

You can also create groups using a visual selection on map or any chart.

What will happened behind the scene ?

Depending on the target datasource (database, extract or cloud) Tableau will try to retrieve the groups data using one of the following technics :

  • Push the data (groups elements and dimension elements) into a temporary table (in a database like mssql, oracle, SAP HANA…).
  • Retrieve all the data at the lower level that construct the group and calculated

Theses technics can’t be cached between users and need to be done sequentially : First populate the temp table and after join it with the data

When you filter on a group element Tableau will :

  • Join raw data to the temporary table and use a where filter on the group column. the join is often not optimal because of the usage of label field or even worse on concatenated field.
  • retrieve all the domain of the dimension that build the groups and then filter the data

In this case (HANA database), Tableau use a temporary table that contain 2 columns : one with the label of the elements of the group and one other column with the name of each groups. This is a link table which is not bad but :

  • Data need to be write to the temporary table for each user/session
  • Join with the temporary table is made with a column that was use to create the group. If users created the group with a label column or a calculated column, the join between temporary table and main data is sub-optimal.

What can I do for better grouping elements ?

1) Grouping using ID

If user can create his groups using ID field instead of label fields it will help when the database will make the join between main data and the temporary table. This is not perfect but it is better than using a label or concatenated field as raw elements for the group.

2) Use a tableau SET

No temporary table for SETs but more complex queries : be carreful with this technic but it can help

3) use a CASE WHEN calculated field

This is harder to do for the end user and depending on the complexity of the expression it can be better. This will avoid the write of a temporary table and filter on group elements is done without retrieving all the domain raw elements.

4) Store the group data in the data model

If the custom group is often used by many users why not pushed to tableau group in the model ? There, relationship group table will be used with ID fields for better join performance !
Of course the table will need to be integrated inside le tableau datasource logical/physical layer

Conclusion

As a conclusion we can say that Tableau Custom Groups is a great feature for end users. They can manually create their own groups and found important insight using this feature. But sometimes groups can lead to performance issues. Theses performance issues happened more often on big databases or datalakes. Their is alternatives to custom groups : SET or CASE WHEN calculated fields are the main alternatives and can give better performance. If the groups become used by many users you should think to push it into your data model !