Parquet

Parquet file sorting test

Update Nov 17, 2023 - Added results using the latest DataFusion version. Some time ago, we came across an intriguing Parquet sorting test shared by Mimoune Djouallah on Twitter @mim_djo. The test involves reading a Parquet file, sorting the table,…


tpch_sf100_duckdb_vs_hyper_total_202304

TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS

Update Apr 12, 2023 - It seems that Windows 11's poor performance may be due to conflicting BIOS/OS settings when dual-booting. We are investigating... Additionally, I have corrected the version of Windows 11 in the post from Home to Professional.…


TPC-H benchmark of Hyper, DuckDB and Datafusion on Parquet files

Update Apr 14, 2023 - An issue has been opened on the DataFusion GitHub repository regarding its poor reported performance compared to DuckDB and Hyper in this specific case: #5942. While there may be multiple factors contributing to this unexpected…


Query Parquet files with DuckDB and Tableau Hyper engines

In this notebook, we are going to query some Parquet files with the following SQL engines: DuckDB : an in-process SQL OLAP database management system. We are going to use its Python Client API [MIT license]. Tableau Hyper : an in-memory data…


Download some benchmark road networks for Shortest Paths algorithms

Updated September 26, 2022 bugfix The goal of this Python notebook is to download and prepare a suite of benchmark networks for some shortest path algorithms. We would like to experiment with some simple directed graphs with non-negative weights. We…


Trying DuckDB with Discogs data

This notebook is a small example of using DuckDB with the Python API. What is DuckDB? DuckDB is an in-process SQL OLAP Database Management System It is a relational DBMS that supports SQL. OLAP stands for Online analytical processing,…