Benchmark

Parquet file sorting test

Update Nov 17, 2023 - Added results using the latest DataFusion version. Some time ago, we came across an intriguing Parquet sorting test shared by Mimoune Djouallah on Twitter @mim_djo. The test involves reading a Parquet file, sorting the table,…


TPC-H benchmark of DuckDB and Hyper on native files

In this blog post, we examine the performance of two popular SQL engines for querying large files: Tableau Hyper / Proprietary License DuckDB / MIT License These engines have gained popularity due to their efficiency, ease of use, and Python APIs.……


tpch_sf100_duckdb_vs_hyper_total_202304

TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS

Update Apr 12, 2023 - It seems that Windows 11's poor performance may be due to conflicting BIOS/OS settings when dual-booting. We are investigating... Additionally, I have corrected the version of Windows 11 in the post from Home to Professional.…


Snowflake TPCH SF100 Results

TPCH with Snowflake : SF100

Snowflake is a fantastic Datawarehouse and Datalake SaaS Solution ! Very easy to use, scale, develop with and continuously integrate data inside of it central storage is quite fun. But what about performance ? Snowflake have a really good reputation…


TPC-H benchmark of Hyper, DuckDB and Datafusion on Parquet files

Update Apr 14, 2023 - An issue has been opened on the DataFusion GitHub repository regarding its poor reported performance compared to DuckDB and Hyper in this specific case: #5942. While there may be multiple factors contributing to this unexpected…


TPCH SF10 Dashboard Comparison

TPCH SF10 : Tableau Hyper Engine vs DuckDB vs Snowflake vs BigQuery vs Databricks vs SingleStore

After a first try with TPCH SF10 using DuckDB on 2 differents laptops and compare parquet storage vs native storage (see TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks) I would like to try th Hyper Engine used by Tableau…


TPCH-SF10-DuckDB_vs_Cloud_Databases

TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks

I was very interested in 2 articles of Mimoune Djouallah (aka mim) that compare Snowflake, BigQuery, Databricks, SingleStore, PowerBI Datamart and DuckDB using tpch benckmark with different scale factor (TPCH-SF10 then SF100) TPCH benchmark is a…


DBEngines Dashboard

Database Engines Trends Dashboard

Database Engines Trends DashboardDB-Engines.com is a website that reference and rank hundreds of database engines, threw the most famous ones likes Oracle, MySQL, SQL Server, PostGreSQL, MongoDB, Redis to omes more confidential like Hazelcast…


More Heapsort in Cython

This post/notebook is the follow-up to a recent one : Heapsort with Numba and Cython, where we implemented heapsort in Python/Numba and Cython and compared the execution time with NumPy heapsort. However, heapsort in NumPy is written in C++…


Loading data from PostgreSQL to Pandas with ConnectorX

ConnectorX is a library, written in Rust, that enables fast and memory-efficient data loading from various databases to different dataframes. We refer to this interesting paper, in which the authors provide a detailed analysis of the pandas.read_sql…


Export data as fast as possible : from HANA to CSV

What is the fastest method to export HANA to CSV ?I use a HANA 2.0 database. I want to export from HANA to CSV. As source, a table or a sql query, as target an external client, of course as fast as possible and using a command line (i’m on…


Gartner Magic Quadrant BI 15 years History

BenchmarkBusiness Intelligence (BI) tools are competing for years. Gartner Magic Quadrant is one of the famous benchmark that classify mainstream BI tools. We have retrieved old MQBI (Magic Quandrant Business Intelligence) benchmarks and compiled…


Applying a row-wise function to a Pandas dataframe

More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU. Because a lot of things evolved since 2018, this post is kind of an update. For example Pandas tag version was 0.23.3 at that time, it is now 1.4.0.…


Stack Overflow trends comparator

Stack Overflow Trends

How i built a Stack overflow trends dashboard Starting using Brent Ozar Stack Overflow database extract i tried to build a dashboard that show evolution of Tags Trends over time and if possible compare trends of several tags between them. I first…


Gartner analytics and Business Intelligence tools comparator

As promised here is an article on the Gartner® analytics and Business Intelligence tools comparator based on the capabilities and use cases of the tools. There are interesting differences on the final scores compared to the Magic Quadrant which…


Loading data into a Pandas DataFrame - a performance study

Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python programmers will often load a full dataset into a Pandas dataframe, without actually…


Looping over Pandas data

I recently stumbled on this interesting post on RealPython (excellent website by the way!): Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects This post has different subjects related to Pandas: creating a datetime column…