Tableau Server Performance impacted by the revision/version history depth of objects

After several tests on real word tableau production environment (+1000 workbooks , +100 shared datasource) we discover that object version history have an impact on tableau server performance, an impact that could be important if the history is deep. It was a real surprise and considering that my client have many objects with more than 40 versions, we had found a real quick-win performance track.

Several question remained after the discovery :

  1. How large the impact is ?

    How may datasources or workbooks with high number of revisions we have and were ?

  2. How to remove old versions to slim tableau object ?
  3. How many versions should we kept ?
  4. Why version history could impact so much the performance ?

    Is it because of the postgreSQL model and queries against it ? A lack of indexes or bad queries that lead to database contention or another thing ?

Let’s start with the discovery story :

At the beginning of the story, we have decided to bring Tableau Server closer to the Source Database (SAP HANA) because of performance issues that was linked to network bandwidth limitation.

Phase 1 : tests on a single VM on the new datacenter with individual published workbooks

To confirm that the new placement of the Tableau server cluster was good enough, we have tested some workbooks individually by downloading them from the production cluster and published them to a test machine. The result was excellent : 10x improvements so we decide to jump to the new datacenter. At this stage we have 2 workbooks retrieves from production. Datasources was switched from published datasource to local datasources in the workbook.

Phase 2 : rebuild the Tableau server clusters

2 weeks after building the dev and production tableau cluster, we restore the environments (same version of tableau server : 2020.4). To be sure that we have good performances on the new cluster, we decided to make some tests on some important dashboards that were already tested in the phase 1.

Phase 3 : Test with the restored environment on the new cluster

The results was far from the expectation ! only 40% improvements for the best case ! Far from the x10 improvement we had in Phase 1.

Phase 4 : Investigations

We tried to understand the reason that could explain the gap between the performances expectation and the reality. We were informed that, for tracability reason, [TableauUserName] and [WorkbookName]  have been added to initial SQL on some datasources.  That embeded datasources could be faster than published datasources : We already knew that. So we started to test some combinations (with/without/ initial SQL + Local/Published datasource). For each try we published a new workbook and datasource and had used a tableau server web performance recording.
We found that an embeded datasource work faster than a published datasource, but not so much, and that was a thing we already knew.
We had created all possible combinations, except the case that already existed with the reference workbook but it was a mistake.

Phase 5 : Illumination

After an illumination on my bike when coming back home, I started to wonder if version history couldn’t be the culprit. I wanted to be clear about it. So, I decided to download the reference workbook, to published the datasource as a new shared datasource and republished the reference workbook as a new one : BAM !

With exactly the same initialSQL, the brand new workbook using a brand new shared datasource was fast !

In other worlds we found that version history in Tableau server could have an important impact on performance 

When using a workbook with 24 versions (35 Revisions) that use a datasource with 24 versions (44 Revisions) : 120s

When using a mono-version workbook that use a mono-version datasource : 25s

How large the impact is ?

Now we need to analyze the spread of the problem by digging into the tableau server repository. In the postgresql « workgroup » database we can find 2 tables that will help : datasources_versions and workbook_versions.
Using tableau to analyse tableau is easy 🙂

For Datasources revisions (versions)
For workbooks revisions (versions)

How to remove old versions to slim tableau object ?

There are several ways to remove objects revisions.

  • The first one is in the tableau server settings but the cleaning is only for workbooks (not for datasources)
  • Using the Rest-API
  • The second way is to use our Mtools4Tableau kit