Snowflake is a fantastic Datawarehouse and Datalake SaaS Solution !

Very easy to use, scale, develop with and continuously integrate data inside of it central storage is quite fun. But what about performance ?

Snowflake have a really good reputation about performance and we would like to know if this reputation is trustable ? TPC-H on the other hand is an old but wildly use database benchmark.

TPC-H use 8 tables and 22 queries to benchmark db performance. The scale factor of the TPC-H define the size of 6 tables, the larger one (lineitem) is 600M rows when Scale Factor is 100.

Snowflake provide from SF10 to SF1000 TPCH tables data that allow you to run your own TPCH with Snowflake Database : that’s what we did with different warehouse sizes from X-Small to 4X-Large

TPC-H Tables and relationships

TPCH Tables Schema

Change Warehouse Size

Changing warehouse size is really easy. You can do that using the GUI console in 3 clics. on even easier, using SQL !

Snowflake alter warehouse size using SQL

Caching or not caching ?

Snowflake no result cache

Result cache is a feature that allow the database to cache the results of queries if it is no too big. Result cache can be a precious feature when dealing with repetitive queries in production (that could happened more than you think). In the TPCH case if we let the result cache ON, every query that was previously run is returned in 43ms… But for this bench it was not the idea to test this feature. So we deactivate the result cache before each run, using the following SQL command :

alter session set USE_CACHED_RESULT=False;