DuckDB

Streaming data from PostgreSQL to a CSV file as fast as possible

Update Jan 25, 2024 - Added FastBCP. In this post, we explore the process of streaming data from a PostgreSQL database to a CSV file using Python. The primary goal is to avoid loading the entire dataset into memory, enabling a more scalable and…


Parquet file sorting test

Update Nov 17, 2023 - Added results using the latest DataFusion version. Some time ago, we came across an intriguing Parquet sorting test shared by Mimoune Djouallah on Twitter @mim_djo. The test involves reading a Parquet file, sorting the table,…


TPC-H benchmark of DuckDB and Hyper on native files

In this blog post, we examine the performance of two popular SQL engines for querying large files: Tableau Hyper / Proprietary License DuckDB / MIT License These engines have gained popularity due to their efficiency, ease of use, and Python APIs.……


tpch_sf100_duckdb_vs_hyper_total_202304

TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS

Update Apr 12, 2023 - It seems that Windows 11's poor performance may be due to conflicting BIOS/OS settings when dual-booting. We are investigating... Additionally, I have corrected the version of Windows 11 in the post from Home to Professional.…


TPC-H benchmark of Hyper, DuckDB and Datafusion on Parquet files

Update Apr 14, 2023 - An issue has been opened on the DataFusion GitHub repository regarding its poor reported performance compared to DuckDB and Hyper in this specific case: #5942. While there may be multiple factors contributing to this unexpected…


TPCH SF10 Dashboard Comparison

TPCH SF10 : Tableau Hyper Engine vs DuckDB vs Snowflake vs BigQuery vs Databricks vs SingleStore

After a first try with TPCH SF10 using DuckDB on 2 differents laptops and compare parquet storage vs native storage (see TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks) I would like to try th Hyper Engine used by Tableau…


Using DuckDB with Tableau Desktop on Windows

DuckDB is an in-process SQL OLAP database management system. These database is very fast for OLAP queries and have several advantages : it can read and write Parquet files very efficiently, it use parallelism and SIMD for impressive performance.…


Query Parquet files with DuckDB and Tableau Hyper engines

In this notebook, we are going to query some Parquet files with the following SQL engines: DuckDB : an in-process SQL OLAP database management system. We are going to use its Python Client API [MIT license]. Tableau Hyper : an in-memory data…


Download some benchmark road networks for Shortest Paths algorithms

Updated September 26, 2022 bugfix The goal of this Python notebook is to download and prepare a suite of benchmark networks for some shortest path algorithms. We would like to experiment with some simple directed graphs with non-negative weights. We…


Trying DuckDB with Discogs data

This notebook is a small example of using DuckDB with the Python API. What is DuckDB? DuckDB is an in-process SQL OLAP Database Management System It is a relational DBMS that supports SQL. OLAP stands for Online analytical processing,…