Nous avions déjà mener un telle étude en 2015 pour un autre client mais le sujet est revenu sur le tapis 7 ans plus tard en 2022. Le direct query etait-il envisageable sur SQL Server 2019 EE pour concurrencer les cubes Microstrategy ?

Dans ce cas d’étude, le client souhaite améliorer le temps de livraison des données et l’alimentation des cubes est le principal frein à un rafraichissement rapide.

Un cube Microstrategy étudié et posant problème est composé de 2 mesures principales et de 12 dimensions.

Deux nouvelles machines physiques sont provisionnées (PROD & DRP/DEV/QUAL). A l’heure du tout cloud cela peut paraître hérétique mais ces machines de compétition permettent de profiter d’une capacité cpu, mémoire, disque très important e dont le cout aurait été prohibitif chez n’importe quel cloud provider.

Les deux machines MSSQL sont identiques et fonctionneront 24/24 7/7 :

  • 64 coeurs / 128 Threads
  • 768Go RAM
  • Stockage dédié et local : 14 TB de disques NVMe délivrant au final 14Go/s en lecture et 11Go/s en écriture pour la data et log (14000 IOPS de 1Mo) , 7Go/s en lecture 1,5Go/s pour la tempdb (7000 IOPS de 1Mo)
  • Réseau 2x10Gbs vers les serveurs applicatifs + 2x10Gbs pour l’externalisation des sauvegardes

Une seule de ces machine de ce type couterait dans les 180k$ par an chez AWS (R6i metal) (et sans doute plus sur Azure ou GCP), soit plus que le prix d’achat de la machine physique…

Microstrategy Cubes Loading

During the audit we saw that some cubes was big and not fast to reload. and another problem is that the loading of the cubes are relaunch just when the previous one finished

Microtrategy VLDB limitations for using partitionning

The cubes did not use Microstrategy partitioning. This technic can reduced the load time, but this need some change in the VLDB options. This change will increase a lot the volume of data transfered between the database and the Microstrategy Intelligence Server

CPU consumption during a partioned cube loading

The problem of the Microstrategy partitioning is that the performance improvement depends on the number of CPU of the Microstrategy IS. In the case of our test we decided to push the partitions number to 8 on a 8vcpu VM. This is not a good practice : in the ideal case, the partitions number should be 1/2 the number of vcpu.
If you want a faster extraction you will need many cpu on the Microstrategy IS.

So what can we do ?

Before the new Hardware of the database and new SQL Version : performance of direct query was not performant and Microstrategy Cubes were mandatory

With the new hardware and new SQL Server version, the equation is completly different

One little problem remain with direct SQL Query : for some queries, when the few restrictions sometimes leave the data too little filtered, the on the fly aggregation take too much time. For exemple aggregate 4M rows to 250k Rows, SQL Server can take 6s in our context. (using 30s of cpu because of the SQL Server Parallelism)

The solution : in databases stored aggregates tables

Decide which aggregate tables should be created

Aggregate Tables, old good solution for improve direct query performance can be used.

We have some dimension elements that are mandatory in all queries. Theses dimensions will constitute the fix part of all the aggregate tables.
On the other part, we axe 3 axis with 6 to 7 levels each. They will constitute the mobile part of the candidates for aggregate tables.

We calculate all the number of rows for each candidate aggregate tables. We will also cross theses information we the custom groups used be final user. This will help to choose the best aggregate tables.

With ou Without aggregate table

The question will be : how long will it take to refresh aggregate tables ?
With the new hardware and using clustered columnstore the performance are quite good  (< 15s) and the space used for the aggregate is ridiculous (1GB)

Conclusion

Using Direct Query totally suppress the long time to load Microstrategy big cubes, leaving CPU ressources for front usage on Microstrategy IS.
In databases aggregates, quickly refreshed and without a big space usage, help to reduce response time for final user and cpu pressure on the database.

Small Microstrategy cubes will be kept, because, fast to reload, especially with the new hardware !