After a first try with TPCH SF10 using DuckDB on 2 differents laptops and compare parquet storage vs native storage (see TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and DatabricksI would like to try th Hyper Engine used by Tableau (the Hyper is also free but not open source). You could find more informations about Tableau Hyper here

You can download Hyper Engine (include in Hyper API) : https://www.tableau.com/support/releases/hyper-api/latest or for pythonistas just

 pip install tableauhyperapi

I used exactly the same data as source by importing them from parquet files into an hyper database/file :

Loading data into Hyper

To load data from parquet into an hyper database i run the following statements :

 

create database if not exists "D:\OpenData\TPCH\data\hyper\tpcf_sf10.hyper";
attach database "D:\OpenData\TPCH\data\hyper\tpcf_sf10.hyper" as tpch_sf10;

create table "REGION" as (select * from 'D:\OpenData\TPCH\data\10\region.parquet');
create table "NATION" as (select * from 'D:\OpenData\TPCH\data\10\nation.parquet');
create table "PART" as (select * from 'D:\OpenData\TPCH\data\10\part.parquet');
create table "CUSTOMER" as (select * from 'D:\OpenData\TPCH\data\10\customer.parquet');
create table "SUPPLIER" as (select * from 'D:\OpenData\TPCH\data\10\supplier.parquet');
create table "PARTSUPP" as (select * from 'D:\OpenData\TPCH\data\10\partsupp.parquet');
create table "ORDERS" as (select * from 'D:\OpenData\TPCH\data\10\orders.parquet' order by 1,2);
create table "LINEITEM" as (select * from 'D:\OpenData\TPCH\data\10\LINEITEM.parquet' order by 1);

 

Results

Conclusion

Hyper Engine is very powerfull even on a 3 years old laptop !

Theses tests should be seen as what they are : a game for me. But there is interesting thing that could been point of :

  • Some databases have found a way to accelerate column/data reads for strings. The Query 13 is interesting for that. The Q13 where clause O_COMMENT NOT LIKE ‘%SPECIAL%REQUESTS%’ will force the read of the O_COMMENT column content. Some databases do that much more faster than others
  • even laptops can be blazing fast 🙂

One thing among others that should be taken into account is that DuckDB is intra-process oriented so it no so easy to access to a remote DuckDB datasource. Hyper on the contrary can be accessed from remote using TCP and a postgresql client.